


Alpha System Reference Manual 


Version 5 


DIGITAL RESTRICTED DISTRIBUTION 


Doc. #50161 


Alpha System Reference Manual — 


Version 5 


This document describes the Alpha architecture. 


_,. This information shall not be disclosed to non-Digital personnel or generally distributed 
., within Digital.. Distribution is restricted to persons authorized and designated by the Alpha 


— Program Office.: This document shall not be left unattended, and when not in use shall be 


_ stored in a locked storage container. 


Digital Equipment Corporation 
Maynard, Massachusetts 





Digital Restricted Di 








May 1992 


Digital believes that the information in this publication is accurate as of its publication date; such 
information is subject to change without notice. Digital is not responsible for any inadvertent errors. 


Copyright ©1992 Digital Equipment Corporation 
All rights reserved. Printed in U.S.A. 


The fllowing are trademarks of Digital Equipment Corporation: DEC, OpenVMS, PDP-11, VAX, VMS, 


_ ..ULTRIX, and the-DIGITAL logo. 


Cray is a registered trademark of Cray Research, Inc. IBM is a registered trademark of International 
Business Machines Corporation. OSF/1 is a registered trademark of Open Software Foundation, Inc. 
UNIX is a registered trademark of UNIX System Laboratories, Inc. | 


This document was prepared using VAX DOCUMENT, Version 2.0. 


Distribution 





Digital Restricted 


Preface 


The Alpha System Reference Manual is divided into 3 Parts, 4 appendixes, and an 
index. . 


Each part or section of a part describes a major portion of the Alpha architecture. 
Each contains its own Table of Contents. Additional sections will be incorporated as 
development proceeds on the architecture. 


The Alpha System Reference Manual is under ECO control. ECOs are approved only 
by the Alpha-A committee. 


The following table outlines the contents of the Alpha SRM: 


Name Symbol Contents 


Part One (I) Common Architecture 
This part describes the architecture that is common to and 


4 Ath we 





Part Two (IT) Specific Operating System PALcode Architecture 
(IiT) This part contains sections that describe how. the following 
operating systems relate to the Alpha architecture: 


Section Name and Contents Symbol 
oo"OpeiVMS Alpha Software “ qy 
DEC OSF/1 Alpha Software (TIT) 


Part Three (IV) Platforms | 7 
This part describes an architected platform implementation. 


Appendixes Because information in the appendixes can be shared by 
ao a elmore than one.section,.they-aregrouped-together at the end 
of the manual. 
Index ... The index.at the end of the manual is structured like 
a master index. Index entries are called out by the 
-.. appropriate symbol, (I), (II), and so forth, associated with 
the corresponding part or section. Index entries for the 
appendixes are called out by appendix name and page 
number. | 
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Common Architecture (I) 


This part describes the common Alpha architecture and contains the following 
chapters: 


e 


Chapter 1, Introduction (I) 

Chapter 2, Basic Architecture (I) 

Chapter 3, Instruction Formats (1) 

Chapter 4, Instruction Descriptions (I) 

Chapter 5, System Architecture and Programming Implications (I) 
Chapter 6, Common PALcode Architecture (I) 

Chapter 7, Console Subsystem Overview (I) — 

Chapter 8, Input/Output (D 
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Chapter 1 
Introduction (I) 


Alpha is a 64-bit load/store RISC architecture that is designed with particular 
emphasis on the three elements that most affect performance: clock speed, multiple 
instruction issue, and multiple processors. 


The Alpha architects examined and analyzed current and theoretical RISC 
architecture design elements and developed high-performance alternatives for the 
Alpha architecture. The architects adopted only those design elements that appeared 
valuable for a projected 25-year design horizon. Thus, Alpha becomes the first 21st 
century computer architecture. 


The Alpha architecture is designed to avoid bias toward any particular operating 
system or programming language. Alpha initially supports the OpenVMS Alpha 
and DEC OSF/1 operating systems, and supports simple software migration from 
applications that run on those operating systems. 


This manual describes in detail how Alpha is designed to be the leadership 64-bit 
architecture of the computer industry. 


1.1 The Alpha Approach to RISC Architecture 


Alpha Is a True 64-Bit Architecture 

Alpha was designed as a 64-bit architecture. All registers are 64 bits in length and 
all operations are performed between 64-bit registers. It is not a 32-bit architecture 
that was later expanded to 64 bits. 


Alpha Is Designed for Very High-Speed Implementations 
The instructions are very simple. All instructions are 32 bits in length. Memory 


operations are either loads or stores. All data manipulation is done between 
registers. 


The Alpha architecture facilitates pipelining multiple instances of the same 
operations because there are no special registers and no condition codes. 


The instructions interact with each other only by one instruction writing a register 
or memory and another instruction reading from the same place. That makes it 
particularly easy to build implementations that issue multiple instructions every 
CPU cycle. (The first implementation issues two instructions per cycle.) 


Alpha makes it easy to maintain binary compatibility across multiple 
implementations and easy to maintain full speed on multiple-issue implementations. 
For example, there are no implementation-specific pipeline timing hazards, no load- 
delay slots, and no branch-delay slots. 
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Aipha’s Approach to Byte Manipulation 
The Alpha architecture does byte shifting and masking with normal 64-bit register- 
to-register instructions, crafted to keep instruction sequences short. 


_ Alpha does not include single-byte store instructions. This has several advantages: 


e Cache and memory implementations need not include byte shift-and-mask logic, 
and sequencer logic need not perform read-modify-write on memory locations. 
Such logic is awkward for high-speed implementation and tends to slow down 
cache access to normal 32-bit or 64-bit aligned quantities. 


e Alpha’s approach to byte manipulation makes it easier to build a high-speed 
error-correcting write-back cache, which is often needed to keep a very fast RISC 
implementation busy. 


e Alpha’s approach can make it easier to pipeline multiple byte operations. 


Alpha’s Approach to Arithmetic Traps 

Alpha lets the software implementor determine the precision of arithmetic traps. 
With the Alpha architecture, arithmetic traps (such as overflow and underflow) 
are imprecise—they can be delivered an arbitrary number of instructions after the 
instruction that triggered the trap. Also, traps from many different instructions can 
be reported at once. That makes implementations that use pipelining and multiple 
issue substantially easier to build. 


However, if precise arithmetic exceptions are desired, trap barrier instructions can 
be explicitly inserted in the program to force traps to be delivered at specific points. 


Alpha’s Approach to Multiprocessor Shared Memory 

As viewed from a second processor (including an I/O device), a sequence of reads and 
writes issued by one processor may be arbitrarily reordered by an implementation. 
This allows implementations to use multibank caches, bypassed write buffers, write 
merging, pipelined writes with retry on error, and so forth. If strict ordering 
between two accesses must be maintained, explicit memory barrier instructions can 
be inserted in the program. 


The basic multiprocessor interlocking primitive is a RISC-style load_locked, modify, 
store_conditional sequence. If the sequence runs without interrupt, exception, or 
an interfering write from another processor, then the conditional store succeeds. 
Otherwise, the store fails and the program eventually must branch back and retry 
the sequence. This style of interlocking scales well with very fast caches, and makes 
Alpha an especially attractive architecture for building multiple-processor systems. 


Alpha Instructions Include Hints for Achieving Higher Speed 


A number of Alpha instructions include hints for implementations, all aimed at 
achieving higher speed. 


¢ Calculated jump instructions have a target hint that can allow much faster 
subroutine calls and returns. 


e There are prefetching hints for the memory system that can allow much higher 
cache hit rates. 
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e There are granularity hints for the virtual-address mapping that can allow much 
more effective use of translation lookaside buffers for large contiguous structures. 


PALcode—Alpha’s Very Flexible Privileged Software Library 

A Privileged Architecture Library (PALcode) is a set of subroutines that are 
specific to a particular Alpha operating system implementation. These subroutines 
provide operating-system primitives for context switching, interrupts, exceptions, 
and memory management. PALcode is similar to the BIOS libraries that are 
provided in personal computers. 


PALcode subroutines are invoked by implementation hardware or by software 
CALL_PAL instructions. 


PALcode is written in standard machine code with some implementation-specific 
extensions to provide access to low-level hardware. 


One version of PALcode lets Alpha implementations run the full OpenVMS operating 
system by mirroring many of the OpenVMS VAX features. The OpenVMS PALcode 
instructions let Alpha run OpenVMS with little more hardware than that found on 
a conventional RISC machine: the PAL mode bit itself, plus 4 extra protection bits 
in each Translation Buffer entry. 


Another version of PALcode lets Alpha implementations run the OSF/1 operating 
system by mirroring many of the RISC ULTRIX features. Other versions of PALcode 
can be developed for real-time, teaching, and other applications. 


PALcode makes Alpha an especially attractive architecture for multiple operating 
systems. 


Aipha and Programming Languages 

Alpha is an attractive architecture for compiling a large variety of programming 
languages. Alpha has been carefully designed to avoid bias toward one or two 
programming languages. For example: | 


e Alpha does not contain a subroutine call instruction that moves a register window 
by a fixed amount. Thus, Alpha is a good match for programming languages with 
many parameters and programming languages with no parameters. 


e Alpha does not contain a global integer overflow enable bit. Such a bit would 
need to be changed at every subroutine boundary when a FORTRAN program 
calls a C program. 


1.2 Data Format Overview 
Alpha is a load/store RISC architecture with the following data characteristics: 


e All operations are done between 64-bit registers. 
¢ Memory is accessed via 64-bit virtual little-endian byte addresses. 
© There are 32 integer registers and 32 floating-point registers. 


e Longword (32-bit) and quadword (64-bit) integers are supported. 
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° Four floating-point data types are supported: 
— VAX F floating (32-bit) 
— VAX G_floating (64-bit) 
— IEEE single (32-bit) 
— IEEE double (64-bit) 


1.3 Instruction Format Overview | 
As shown in Figure 1—1, Alpha instructions are all 32 bits in length. As represented 
in Figure 1—1, there are four major instruction format classes that contain 0, 1, 2, 
or 3 register fields. All formats have a 6-bit opcode. 


Figure 1-1: Instruction Format Overview 


31 26 25 2120 1615 5 4 0 


PALcode Format 


Branch Format 
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¢ PALcode instructions specify, in the function code field, one of a few dozen 
complex operations to be performed. 





¢ Conditional branch instructions test register Ra and specify a signed 21- 
_ bit PC-relative longword target displacement. Subroutine calls put the return 
address in register Ra. 


¢ Load and store instructions move longwords or quadwords between register 


Ra and memory, using Ra plus a signed 16-bit displacement as the memory 
address. | 


¢ Operate instructions for floating-point and integer operations are both 
represented in Figure 1—1 by the operate format illustration and are as follows: 


— Floating-point operations use Ra and Rb as source registers, and write the 
result in register Re. There is an 11-bit extended opcode in the function field. 


— Integer operations use Ra and Rb or an 8-bit literal as the source operand, 
and write the result in register Re. 


Integer operate instructions can use the Rb field and part of the function field 


to specify an 8-bit literal. There is a 7-bit extended opcode in the function 
field. : 
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1.4 Instruction Overview 


PALcode Instructions 


As described above, a Privileged Architecture Library (PALcode) is a set of 
subroutines that is specific to a particular Alpha operating-system implementation. 
These subroutines can be invoked by hardware or by software CALL_PAL 
instructions, which use the function field to vector to the specified subroutine. 


Branch Instructions 


Conditional branch instructions can test a register for positive/negative or for zero 
/nonzero. They can also test integer registers for even/odd. 


Unconditional branch instructions can write a return address into a register. 


There is also a calculated jump instruction that branches to an arbitrary 64-bit 
address in a register. 


Load/Store Instructions 

Load and store instructions move either 32-bit or 64-bit aligned quantities from 
and to memory. Memory addresses are flat 64-bit virtual addresses, with no 
segmentation. 


. The VAX floating-point load/store instructions swap words to give a consistent 
register format for floating-point operations. 


A 32-bit integer datum is placed in a register in a canonical form that makes 33 copies 
of the high bit of the datum. A 32-bit floating-point datum is placed in a register in 
a canonical form that extends the exponent by 3 bits and extends the fraction with 
29 low-order zeros. The 32-bit operates preserve these canonical forms. 


There are facilities for doing byte manipulation in registers, eliminating the need 
for 8-bit or 16-bit load/store instructions. 


Compilers, as directed by user declarations, can generate any mixture of 32-bit and 
64-bit operations. The Alpha architecture has no 32/64 mode bit. 


Integer Operate Instructions 


The integer operate instructions manipulate full 64-bit values, and include the usual 
assortment of arithmetic, compare, logical, and shift instructions. 


There are just three 32-bit integer operates: add, subtract, and multiply. They 
differ from their 64-bit counterparts only in overflow detection and in producing 
32-bit canonical results. 


There is no integer divide instruction. 

The Alpha architecture also supports the following additional operations: 

° Scaled add/subtract instructions for quick subscript calculation 

e 128-bit multiply for division by a constant, and multiprecision arithmetic 


¢ Conditional move instructions for avoiding branch instructions 
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e An extensive set of in-register byte and word manipulation instructions 


Integer overflow trap enable is encoded in the function field of each instruction, 
rather than kept in a global state bit. Thus, for example, both ADDQ/V and ADDQ 
opcodes exist for specifying 64-bit ADD with and without overflow checking. That 
makes it easier to pipeline implementations. 


Floating-Point Operate Instructions 

The floating-point operate instructions include four complete sets of VAX and 
IEEE arithmetic instructions, plus instructions for performing conversions between 
floating-point and integer quantities. 


In addition to the operations found in conventional RISC architectures, Alpha 
includes conditional move instructions for avoiding branches and merge sign 
/exponent instructions for simple field manipulation. 


The arithmetic trap enables and rounding mode are encoded in the function field 
of each instruction, rather then kept in global state bits. That makes it easier to 
pipeline implementations. 


1.5 Instruction Set Characteristics 


Alpha instruction set characteristics are as follows: 
e All instructions are 32 bits long and have a regular format. 


e There are 32 integer registers (RO through R31), each 64 bits wide. R31 reads 
as zero, and writes to R31 are ignored. 


e There are 32 floating-point registers (FO through F31), each 64 bits wide. F31 
reads as zero, and writes to F31 are ignored. 


e All integer data manipulation is between integer registers, with up to two 
variable register source operands (one may be an 8-bit literal), and one register 
destination operand. 


e All floating-point data manipulation is between floating-point registers, with up 
to two register source operands and one register destination operand. 


e All memory reference instructions are of the load/store type that move data 
between registers and memory. 


¢ There are no branch condition codes. Branch instructions test an integer or 
floating-point register value, which may be the result of a previous compare. 


e Integer and logical instructions operate on quadwords. 


¢ Floating-point instructions operate on G_floating, F_floating, IEEE double, and 
IEEE single operands. D_floating “format compatibility,” in which binary files 
of D_floating numbers may be processed, but without the last 3 bits of fraction 
precision, is also provided. 


e A minimal number of VAX compatibility instructions are included. 
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1.6 Terminology and Conventions 
The following sections describe the terminology and conventions used in this book. 


1.6.1 Numbering 


All numbers are decimal unless otherwise indicated. Where there is ambiguity, 
numbers other than decimal are indicated with the name of the base in subscript 
form, for example, 10 6g. 


1.6.2 Security Holes 


A security hole is an error of commission, omission, or oversight in a system that 
allows protection mechanisms to be bypassed. 


Security holes exist when unprivileged software (that is, software running outside 
of kernel mode) can: 


e Affect the operation of another process without authorization from the operating 
system; 


e Amplify its privilege without authorization from the ate system; or 


¢ Communicate with another process, either overtly or covertly, without 
authorization from the operating system. 


The Alpha architecture has been designed to contain no architectural security holes. 
Hardware (processors, buses, controllers, and so on) and software should likewise 
be designed to avoid security holes. 


1.6.3 UNPREDICTABLE And UNDEFINED 


The terms UNPREDICTABLE and UNDEFINED are used throughout this book. 
Their meanings are quite different and must be carefully distinguished. 


In particular, only privileged software (software running in kernel mode) can trigger 
UNDEFINED operations. Unprivileged software cannot trigger UNDEFINED 
operations. However, either privileged or unprivileged software can trigger 
UNPREDICTABLE results or occurences. 


UNPREDICTABLE results or occurences do not disrupt the basic operation of the 
processor; it continues to execute instructions in its normal manner. In contrast, 
UNDEFINED operation can halt the processor or cause it to lose information. 


The terms UNPREDICTABLE and UNDEFINED can be further described as follows: 


UNPREDICTABLE 


e Results or occurrences specified as UNPREDICTABLE may vary from moment 
to moment, implementation to implementation, and instruction to instruction 
within implementations. Software can never depend on results specified as 
UNPREDICTABLE. 


e An UNPREDICTABLE result may acquire an arbitrary value subject to a few 
constraints. Such a result may be an arbitrary function of the input operands 


Introduction (1) 1-7. 





Distrioution 





resiricte d | 





or of any state information that is accessible to the process in its current access 
mode. UNPREDICTABLE results may be unchanged from their previous values. 


Operations that produce UNPREDICTABLE results may also produce exceptions. 


e An occurrence specified as UNPREDICTABLE may happen or not based on an 
arbitrary choice function. The choice function is subject to the same constraints 
as are UNPREDICTABLE results and, in particular, must not constitute a 
security hole. 


Specifically, UNPREDICTABLE results must not depend upon, or be a function 
of, the contents of memory locations or registers which are EAecosete to the 
current process in the current access mode. 


Also, operations that may produce UNPREDICTABLE results must not: 


— Write or modify the contents of memory locations or registers to which the 
current process in the current access mode does not have access, or 


— Halt or hang the system or any of its components. 


For example, a security hole would exist if some UNPREDICTABLE result 
depended on the value of a register in another process, on the contents of 
processor temporary registers left behind by some previously running process, 
or on a sequence of actions of different processes. 


UNDEFINED 


e Operations specified as UNDEFINED may vary from moment to moment, 
implementation to implementation, and instruction to instruction within 
implementations. The eperauen may vary in effect from nothing, to stopping 
system operation. 


¢ UNDEFINED operations may halt the processor or cause it to Inge information. 
However, UNDEFINED operations must not cause the processor to hang, that 
is, reach an unhalted state from which there is no transition to a normal state 
in which the machine executes instructions. 


1.6.4 Ranges and Extents 


Ranges are specified by a pair of numbers separated by a and are inclusive. For 
example, a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4. 


a 


Extents are specified by a pair of numbers in angle brackets separated by a colon 
and are inclusive. For example, bits <7:3> specify an extent of bits including bits 7, 
6, 5, 4, and 3. | 


1.6.5 ALIGNED and UNALIGNED 


In this document the terms ALIGNED and NATURALLY ALIGNED are used 
interchangeably to refer to data objects that are nowers of two in size. An aligned 
datum of size 2**N is stored in memory at a byte address that is a multiple of 2**N, 
that is, one that has N low-order zeros. Thus, an aligned 64-byte stack frame has a 


memory address that is a sa of 64. 
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If a datum of size 2**N is stored at a byte address that is not a multiple of 2**N, it 
is called UNALIGNED. 
1.6.6 Must Be Zero (MBZ) 
Fields specified as Must be Zero (MBZ) must never be filled by software with a non- 
zero value. These fields may be used at some future time. If the processor encounters 
a non-zero value in a field specified as MBZ, an Illegal Operand exception occurs. 
1.6.7 Read As Zero (RAZ) 


Fields specified as Read as Zero (RAZ) return a zero when read. 


1.6.8 Should Be Zero (SBZ) 


Fields specified as Should be Zero (SBZ) should be filled by software with a zero 
value. Non-zero values in SBZ fields produce UNPREDICTABLE results and may > 
produce extraneous instruction-issue delays. 


1.6.9 Ignore (IGN) 
Fields specified as Ignore (IGN) are ignored when written. 


1.6.10 Implementation Dependent (IMP) 


_ Fields specified as Implementation Dependent (IMP) may be used for implementation- 
specific purposes. Each implementation must document fully the behavior of all 
fields marked as IMP by the Alpha specification. 


1.6.11 Figure Drawing Conventions 


Figures that depict registers or memory follow the convention that increasing 
addresses run right to left and top to bottom. 


NOTE | 
\A note on the manual format: At certain points 
in the manual, comments on why certain decisions 
were made, unresolved issues, etc., are between a pair 
of backslashes. These comments provide additional 
clarification and will be removed from externally 
distributed editions. \ 


1.6.12 Macro Code Example Conventions 


All instructions in macro code examples are either listed in Chapter 4 or OpenVMS 
Section, Chapter 2, or are stylized code forms found in Appendix A. 
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1.7 \Revision History 
Revision 5.0, May 12, 1992 


1. 
2. 
3. 


VMS —> OpenVMS 
Converted to SDML 
Removed reference to EVAX 


Revision 4.0, March 29, 1991 


T; 


2 
3 
4. 
5 


Typos 

Correct security holes text 

Upgrade UNPREDICTABLE definition 
Add Implementation Dependent definition 


. Add new section, Section 1.6.12, Macro Code Example Conventions 


Revision 3.0, March 2, 1990 


1. 
2. 
3. 


Strengthen UNPREDICTABLE definition 
Add UNALIGNED definition 
Add Security Hole definition 


Revision 2.0, October 4, 1989 


i; 
2. 


Change the read as zero, write ignored registers to R31 and F31 


Update instruction Set Characteristics for new insert and merge byte instructions 


Revision 1.0, May 23, 1989 


1. 


Change MBZ and SBZ definitions 


Revision 0.0, March 15, 1988 


1. 


Initial version 
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Chapter 2 
Basic Architecture (I) 


2.1 Addressing 


The basic addressable unit in Alpha is the 8-bit byte. Virtual addresses are 64 
bits long. An implementation may support a smaller virtual address space. The 
minimum virtual address size is 43 bits. 


Virtual addresses as seen by the program are translated into physical memory 
addresses by the memory management mechanism. 


2.2 Data Types 
Following are descriptions of the Alpha architecture data types. 


2.2.1 Byte 


A byte is 8 contiguous bits starting on an addressable byte boundary. The bits are 
numbered from right to left, 0 through 7, as shown in Figure 2-1. 


Figure 2-1: Byte Format 


7 0 


A byte is specified by its address A. A byte is an 8-bit value. The byte is only 
supported in Alpha by the extract, mask, insert, and zap instructions. 


2.2.2 Word 


A word is 2 contiguous bytes starting on an arbitrary byte boundary. The bits are 
numbered from right to left, 0 through 15, as shown in Figure 2-2. | 
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‘Figure 2-2: Word Format 


15 0 


A word is specified by its address, the address of the byte containing bit 0. 


A word is a 16-bit value. The word is only supported in Alpha by the extract, mask, 
and insert instructions. 


2.2.3 Longword 


A longword is 4 contiguous bytes starting on an arbitrary byte boundary. The bits 
are numbered from right to left, 0 through 31, as shown in Figure 2-3. 


Figure 2-3: Longword Format 


31 0 


A longword is specified by its address A, the address of the byte containing bit 0. A 
longword is a 32-bit value. 


When interpreted arithmetically, a longword is a two’s-complement integer with bits 
of increasing significance from 0 through 30. Bit 31 is the sign bit. The longword 
is only supported in Alpha by sign-extended load and store instructions and by 
longword arithmetic instructions. 


NOTE 
Alpha implementations will impose a significant 
performance penalty when accessing longword operands 
that are not naturally aligned. (A naturally aligned 
longword has zero as the low-order two bits of its 
address. ) 


2.2.4 Quadword 


A quadword is 8 contiguous bytes starting on an arbitrary byte boundary. The bits 
are numbered from right to left, 0 through 63, as shown in Figure 2—4. 





Figure 2-4: Quadword Format 


63 0 


= 
bs 
4 
. 


A quadword is specified by its address A, the address of the byte containing bit 0. A 
quadword is a 64-bit value. When interpreted arithmetically, a quadword is either 
a two’s-complement integer with bits of increasing significance from 0 through 62 
and bit 63 as the sign bit, or an unsigned integer with bits of increasing significance 
from 0 through 683. 


NOTE : 
Alpha implementations will impose a significant perfor- 
mance penalty when accessing quadword operands that 
are not naturally aligned. (A naturally aligned quad- 
word has zero as the low-order three bits of its address.) 


2.2.5 VAX Floating-Point Formats 


2.2.5.1 


VAX floating-point numbers are stored in one set of formats in memory and in a 
second set of formats in registers. The floating-point load and store instructions 
convert between these formats purely by rearranging bits; no rounding or range- 
checking is done by the load and store instructions. 


F_floating 


An F floating datum is 4 contiguous bytes in memory starting on an arbitrary 
byte boundary. The bits are labeled from right to left, 0 through 31, as shown 
in Figure 2-5. 


Figure 2-5: F_floating Datum 


An F floating operand occupies 64 bits in a floating register, left-justified in the 
64-bit register, as shown in Figure 2-6. 


Basic Architecture (I) 2-3 





Figure 2-6: F_floating Register Format 


63 62 «B82 51 29 28 | 0 


The F_floating load instruction reorders bits on the way in from memory, expands the 
exponent from 8 to 11 bits, and sets the low-order fraction bits to zero. This produces 
in the register an equivalent G floating number suitable for either F_floating or G_ 
floating operations. The mapping from 8-bit memory-format exponents to 11-bit 
register-format exponents is shown in Table 2-1. 


Table 2-1: F_floating Load Exponent Mapping 
Memory <14:7> Register <62:52> 


11111111 | 1 000 1111111 
1 XxXxxxxx 1 000 xxxxxxx = (xxxxxxx not all 1’s) 
O xxxxxxx Q 111 xxxxxxx = (xxxxxxx not all 0’s) 
0 0000000 0 000 0000000 


This mapping preserves both normal values and exceptional values. 


The F_floating store instruction reorders register bits on the way to memory and 
does no checking of the low-order fraction bits. Register bits <61:59> and <28:0> are 
ignored by the store instruction. 


An F_floating datum is specified by its address A, the address of the byte containing 
bit 0. The memory form of an F_floating datum is sign magnitude with bit 15 the 
sign bit, bits <14:7> an excess-128 binary exponent, and bits <6:0> and <31:16> 
a normalized 24-bit fraction with the redundant most significant fraction bit not 
represented. Within the fraction, bits of increasing significance are from 16 through 
31 and 0 through 6. The 8-bit exponent field encodes the values 0 through 255. 
An exponent value of 0, together with a sign bit of 0, is taken to indicate that the 
F_floating datum has a value of 0. 


If the result of a VAX floating-point format instruction has a value of zero, the 
instruction always produces a datum with a sign bit of 0, an exponent of 0, and 
all fraction bits of 0. Exponent values of 1..255 indicate true binary exponents 
of —127..127. An exponent value of 0, together with a sign bit of 1, is taken as a 
reserved operand. Floating-point instructions processing a reserved operand take an 
arithmetic exception. The value of an F_floating datum is in the approximate range 
0.29*10**—38..1.7*10**38. The precision of an F_floating datum is approximately 
one part in 2**23, typically 7 decimal digits. 
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NOTE 
Alpha implementations will impose a significant. per- 
formance penalty when accessing F_floating operands 
that are not naturally aligned. (A naturally aligned F_ 
floating datum has zero as the low-order two bits of its 
address.) | | 


2.2.5.2 G_floating 


A G_floating datum in memory is 8 contiguous bytes starting on an arbitrary byte 
boundary. The bits are labeled from right to left, 0 through 63, as shown in 
Figure 2—7. 


Figure 2-7: G_floating Datum 


15 14 43 0 


Fraction Midh ‘A+2 
Fraction Midl "A+4 


Fraction Lo :“A+6 





A G. floating operand occupies 64 bits in a floating register, arranged as shown in 
Figure 2-8. 


Figure 2-8: G_floating Format 


63 62 52 51 48 47 32 31 16 15 0 


A G_floating datum is specified by its address A, the address of the byte containing 
bit 0. The form of a G_floating datum is sign magnitude with bit 15 the sign bit, bits 
<14:4> an excess-1024 binary exponent, and bits <3:0> and <63:16> a normalized 53- 
bit fraction with the redundant most significant fraction bit not represented. Within 
the fraction, bits of increasing significance are from 48 through 63, 32 through 47, 16 
through 31, and 0 through 3. The 11-bit exponent field encodes the values 0 through 
2047. An exponent value of 0, together with a sign bit of 0, is taken to indicate that 
the G_floating datum has a value of 0. 


If the result of a floating-point instruction has a value of zero, the instruction 
always produces a datum with a sign bit of 0, an exponent of 0, and all 
fraction bits of 0. Exponent values of 1..2047 indicate true binary exponents of 
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2.2.5.3 


—1023..1023. An exponent value of 0, together with a sign bit of 1, is taken as a 
reserved operand. Floating-point instructions processing a reserved operand take 
a user-visible arithmetic exception. The value of a G_floating datum is in the 
approximate range 0.56*10**—308..0.9*10**308. The precision of a G_floating datum 
is approximately one part in 2**52, typically 15 decimal digits. 


NOTE 
Alpha implementations will impose a significant per- 
formance penalty when accessing G_floating operands 
that are not naturally aligned. (A naturally aligned G_ 
floating datum has zero as the low-order three bits of its 
address.) 


D_ floating 


A D_floating datum in memory is 8 contiguous bytes starting on an arbitrary byte 
boundary. The bits are labeled from right to left, 0 through 63, as shown in 
Figure 2-9. 


Figure 2-9: D_ floating Datum 


15 14 


| ee | Fen | 


Fraction Midh "A+2 


Fraction Midl 


A D_floating operand occupies 64 bits in a floating register, arranged as shown in 
Figure 2-10. 






‘A+4 
‘A+6 


Figure 2-10: D_floating Register Format 


63 62 55 54 4847 — 32 31 1615 


The reordering of bits required for a D_floating load or store are identical to those 
required for a G_floating load or store. The G_floating load and store instructions 
are therefore used for loading or storing D_floating data. 





A D_floating datum is specified by its address A, the address of the byte containing 
bit 0. The memory form of a D_floating datum is identical to an F_floating datum 
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except for 32 additional low significance fraction bits. Within the fraction, bits of 
increasing significance are from 48 through 63, 32 through 47, 16 through 31, and 0 
through 6. The exponent conventions and approximate range of values is the same 
for D_floating as F_floating. The precision of a D_floating datum is approximately 
one part in 2**55, typically 16 decimal digits. 


NOTE 

D_floating is not a fully supported data type; no 
D_floating arithmetic operations are provided in the 
architecture. For backward compatibility, exact D_ 
floating arithmetic may be provided via software 
emulation. D_floating “format compatibility” in which 
binary files of D_floating numbers may be processed, 
but without the last 3 bits of fraction precision, can 
be obtained via conversions to G_floating, G arithmetic 
operations, then conversion back to D_floating. 


NOTE 
Alpha implementations will impose a significant 
performance penalty on access to D_floating operands 
that are not naturally aligned. (A naturally aligned D_ 
floating datum has zero as the low-order three bits of its 
address. ) 


2.2.6 IEEE Floating-Point Formats 


The IEEE standard for binary floating-point arithmetic, ANSI/IEEE 754-1985, 
defines four floating-point formats in two groups, basic and extended, each having 
two widths, single and double. The Alpha architecture supports the basic single 
and double formats, with the basic double format serving as the extended single 
format. The values representable within a format are specified by using three integer 
parameters: 


1. P—the number of fraction bits 

2. Emax—the maximum exponent 

3. Emin—the minimum exponent 

Within each format, only the following entities are permitted: 

1. Numbers of the form (—1)**S x 2**E x b(0).b(1)b(2)..b(P—1) where: 
a. S=Oorl 
b. E = any integer between Emin and Emax, inclusive 
ce. b(n)=Oorl 

2. Two infinities—positive and negative 

3. At least one Signaling NaN 
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2.2.6.1 


4. At least one Quiet NaN 


NaN is an acronym for Not-a-Number. A NaN is an IEEE floating-point bit 
pattern that represents something other than a number. NaNs come in two forms: 
Signaling NaNs and Quiet NaNs. Signaling NaNs are used to provide values 
for uninitialized variables and for arithmetic enhancements. Quiet NaNs provide 
retrospective diagnostic information regarding previous invalid or unavailable data 
and results. Signaling NaNs signal an invalid operation when they are an operand 
to an arithmetic instruction, and may generate an arithmetic exception. Quiet 
NaNs propagate through almost every operation without generating an arithmetic 
exception. 


_ Arithmetic with the infinities is handled as if the operands were of arbitrarily large 


magnitude. Negative infinity is less than every finite number; positive eee is 
greater than every finite number. 


S_Floating 


An IEEE single-precision, or S_floating, datum occupies 4 contiguous bytes in 
memory starting on an arbitrary byte boundary. The bits are labeled from right 
to left, 0 through 31, as shown in Figure 2—11. 


Figure 2-11: S_floating Datum 


1514 7 6 0 


See | rem ne 


An S_floating operand occupies 64 bits in a floating register, left-justified in the 
64-bit register, as shown in Figure 2-12. 


Figure 2-12: S_floating Register Format 


63 62 52 51 29 28 0 


The S_floating load instruction reorders bits on the way in from memory, expanding 
the exponent from 8 to 11 bits, and sets the low-order fraction bits to zero. This 
produces in the register an equivalent T floating number, suitable for either S_ 
floating or T_floating operations. The mapping from 8-bit memory-format exponents 
to 11-bit register-format exponents is shown in Table 2-2. 
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Table 2-2: S$ floating Load Exponent Mapping 
Memory <30:23> Register <62:52> 


1 1111111 1 111 1111111 

1 xxxxxxx 1 000 xxxxxxx (xxxxxxx not all 1’s) 
0 xxxxxxx QO 111 xxxxxxx 3 = (xxxxxxx not all 0’s) 
0 0000000 0 000 0000000 


This mapping preserves both normal values and exceptional values. Note that the 
mapping for all 1’s differs from that of F_floating load, since for S_floating all 1’s is 
an exceptional value and for F_floating all 1’s is a normal value. 


The S_floating store instruction reorders register bits on the way to memory and 
does no checking of the low-order fraction bits. Register bits <61:59> and <28:0> are 
ignored by the store instruction. The S_ floating load instruction does no checking of 
the input. 


The S_ floating store instruction does no checking of the data; the preceding operation 
should have specified an S_floating result. 


An S_floating datum is specified by its address A, the address of the byte containing 
bit 0. The memory form of an S_floating datum is sign magnitude with bit 31 the sign 
bit, bits <30:238> an excess-127 binary exponent, and bits <22:0> a 23-bit fraction. 


The value (V) of an S_floating number is inferred from its constituent sign (5S), 
exponent (E), and fraction (F) fields as follows: 


If E=255 and F<s0, then V is NaN , regardless of S. 

If F=255 and F=0, then V = (-1)**S x Infinity. 

If 0 < E < 255, then V = (-1)**S x 2**(E-127) x (LP). 
If E=0 and F<>0, then V = (-1)**8 x 2**(-126) x (0.F). 
5. If E=0 and F=0, then V = (—1)**S x 0 (zero). 


Floating-point operations on S_floating numbers may take an arithmetic exception 
for a variety of reasons, including invalid operations, overflow, underflow, division 
by zero, and inexact results. 


~ Oo NY 


NOTE 
Alpha implementations will impose a significant per- 
formance penalty when accessing S_floating operands 
that are not naturally aligned. (A naturally aligned S_ 
floating datum has zero as the low-order two bits of its 
address.) | 
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2.2.6.2 T_floating 


An IEEE double-precision, or T_floating, datum occupies 8 contiguous bytes in 
memory starting on an arbitrary byte boundary. The bits are labeled from right 
to left, 0 through 63, as shown in Figure 2-13. - > 


Figure 2-13: T_floating Datum 


15 14 4 3 0 


PFeaonte |} 
S[_eeeren [Fash 


A T_floating operand occupies 64 bits in a floating register, arranged as shown in 
Figure 2—14. | 











‘A+4 
‘A+6 


Figure 2-14: T_floating Register Format 


63 62 52 51 48 47 32 31 1615 0 


The T_floating load instruction performs no bit reordering on input, nor does it 
perform checking of the input data. 


The T_floating store instruction performs no bit reordering on output. This 
instruction does no checking of the data; the preceding operation should have 
specified a T_floating result. 7 


A T_floating datum is specified by its address A, the address of the byte containing 
bit 0. The form of a T_floating datum is sign magnitude with bit 63 the sign bit, bits 
<62:52> an excess-1023 binary exponent, and bits <51:0> a 52-bit fraction. 


The value (V) of a T_floating number is inferred from its constituent sign (S), 
exponent (E), and fraction (F) fields as follows: 


1. If E=2047 and F<>0, then V is NaN, regardless of S. 

2. If E=2047 and F=0, then V = (-1)**S x Infinity. 

3. If0 <E < 2047, then V = (-1)**S x 2**(E-1023) x (LF). 
4. If E=0 and F<>0, then V = (-1)**S x 2**(-1022) x (.F). 
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5. If E=0 and F=0, then V = (—1)**S x 0 (zero). 


Floating-point operations on T floating numbers may take an arithmetic exception 
for a variety of reasons, including invalid operations, every underflow, division 
by zero, and inexact results. | 


NOTE 
Alpha implementations will impose a significant per- 
formance penalty when accessing T_floating operands 
that are not naturally aligned. (A naturally aligned T_ 
floating datum has zero as the low-order three bits of its 
address. ) 


2.2./ Longword Integer Format in Floating-Point Unit 
A longword integer operand occupies 32 bits in memory, arranged as shown in 
Figure 2-15. 


Figure 2-15: Longword integer Datum 


15 14 0 


[tment de 


A longword integer operand occupies 64 bits in a floating register, arranged as shown 
in Figure 2-16. 


Figure 2—16: Longword Integer Floating-Register Format 


63 62 61 5058 45 44 29 28 0 


There is no explicit longword load or store instruction; the S_floating load/store 
instructions are used to move longword data into or out of the floating registers. 
The register bits <61:59> are set by the S_ floating load exponent mapping. They are 
ignored by S_floating store. They are also ignored in operands of a longword integer 
operate instruction, and they are set to 000 in the result of a longword operate 
instruction. 


The register format bit <62>, “T’, in Figure 2-16 is part of the Integer Hi field 
in Figure 2-15 and represents the high-order bit of that field. Bits <58:45> of 
Figure 2—16 are the remaining bits of the Integer Hi field of Figure 2—15. 
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NOTE : 
Alpha implementations will impose a_ significant 
performance penalty when accessing longwords that are 
not naturally aligned. (A naturally aligned longword 
datum has zero as the low-order two bits of its address.) 


2.2.8 Quadword Integer Format in Floating-Point Unit 
A quadword integer operand occupies 64 bits in memory, arranged as shown in 
Figure 2-17. 


Figure 2-17: Quadword Integer Datum 


15 14 0 


Cowie 
L_ewoerst 


A quadword integer operand occupies 64 bits in a floating register, arranged as 
shown in Figure 2-18. 











‘A+4 
‘A+6 


Figure 2-18: Quadword Integer Floating-Register Format 


63 62 48 47 32 31 16 15 0 


There is no explicit quadword load or store instruction; the T_floating load/store 
instructions are used to move quadword data into or out of the floating registers. 


The T_floating load instruction performs no bit reordering on input. The T_floating 
store instruction performs no bit reordering on output. This instruction does no 
checking of the data; when used to store quadwords, the preceding operation should 
have specified a quadword result. 


NOTE 
Alpha implementations will impose a_ significant 
performance penalty when accessing quadwords that 
are not naturally aligned. (A naturally aligned 
quadword datum has zero as the low-order three bits 
of its address.) 
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2.2.9 Data Types with No Hardware Support 


The following VAX data types are not directly supported in Alpha hardware. \ See 
the DEC STD 032: VAX Architecture Standard for detailed information on these 
data types. \ 


Octaword 

H_floating 

D_floating (except load/store and convert to/from G_floating) 
Variable-Length Bit Field 

Character String 

Trailing Numeric String 

Leading Separate Numeric String 

Packed Decimal String 
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2.3 \Revision History - 
Revision 5.0, May 12, 1992 
1. Converted to SDML 
Revision 4.0, March 29, 1991 | 
1. D_floating point support removed 


2. Typos 
. Word definition made homologous to longword, quadword 


3 
4. Specify no checking on S_floating load, and T_floating load 
5. Removed S_floating Format illustration and text 

6 


Clarified what is meant by a Vax floating point instruction 


Revision 3.0, March 2, 1990 


1. Cosmetic change to floating-point pictures 


Revision 2.0, October 4, 1989 

1. No change 

Revision 1.0, May 23, 1989 

1. Change minimum virtual address size to 40 bits 
2. Change Floating-point register format 

8. Remove alignment warning on word data type 
Revision 0.0, March 15, 1989 | 


1. Initial version 
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Chapter 3 
Instruction Formats (I) 


3.1 Alpha Registers 


Each Alpha processor has a set of registers that hold the current processor state. 
If an Alpha system contains multiple Alpha processors, there are multiple per- 
processor sets of these registers. 


3.1.1 Program Counter 


The Program Counter (PC) is a special register that addresses the instruction stream. 
As each instruction is decoded, the PC is advanced to the next sequential instruction. 
This is referred to as the updated PC. Any instruction that uses the value of the PC 
will use the updated PC. The PC includes only bits <63:2> with bits <1:0> treated as 
RAZ/IGN. This quantity is a longword-aligned byte address. The PC is an implied 
operand on conditional branch and subroutine jump instructions. The PC is not 
accessible as an integer register. 


3.1.2 Integer Registers 
There are 32 integer registers (RO through R31), each 64 bits wide. 


Register R31 is assigned special meaning by the Alpha architecture. When R31 is 
specified as a register source operand, a zero-valued operand is supplied. 


For all cases except the Unconditional Branch and Jump instructions, results of 
an instruction that specifies R31 as a destination operand are discarded. Also, 
it is UNPREDICTABLE whether the other destination operands (implicit and 
explicit) are changed by the instruction. It is implementation dependent to what 
extent the instruction is actually executed once it has been fetched. It is also 
UNPREDICTABLE whether exceptions are signaled during the execution of such 
an instruction. Note, however, that exceptions associated with the instruction fetch 
of such an instruction are always signaled. 


There are some interesting cases involving R31 as a destination: 
e STx_C R31,disp(Rb) 


Although this might seem like a good way to zero out a shared location and reset 
the lock_flag, this instruction causes the lock_flag and virtual location {Rbv + 
SEXT\(disp)} to become UNPREDICTABLE. | 


¢ LDx_L R31,disp(Rb) 


_ This instruction produces no useful result since it causes both lock_flag and 
_locked_physical_address to become UNPREDICTABLE. 
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Unconditional Branch (BR and BSR) and Jump (JMP, JSR, RET, and JSR_ 
COROUTINE) instructions, when R31 is specified as the Ra operand, execute 
normally and update the PC with the target virtual address. Of course, no PC 
value can be saved in R31. 


3.1.3 Floating-Point Registers 
There are 32 floating-point registers (FO through F31), each 64 bits wide. 


When F31 is specified as a register source operand, a true zero-valued operand is 
supplied. See Section 4.7.2 for a definition of true zero. 


Results of an instruction that specifies F31 as a destination operand are discarded 
and it is UNPREDICTABLE whether the other destination operands (implicit and 
explicit) are changed by the instruction. In this case, it is implementation-dependent 
to what extent the instruction is actually executed once it has been fetched. It is also 
UNPREDICTABLE whether exceptions are signaled during the execution of such an 
instruction. Note, however, that exceptions associated with the instruction fetch of 
such an instruction are always signaled. 


A floating-point instruction that operates on single-precision data reads all bits 
<63:0> of the source floating-point register. A floating-point instruction that 
produces a single-precision result writes all bits <63:0> of the destination floating- 
point register. 


3.1.4 Lock Registers 


There are two per-processor registers associated with the LDx_L and STx_C 
instructions, the lock_flag and the locked_physical_address register. The use of these 
registers is described in Section 4.2. 

3.1.5 Optional Registers | | 
Some Alpha implementations may include optional memory prefetch or VAX 
compatibility processor registers. 

3.1.5.1 Memory Prefetch Registers 


If the prefetch instructions FETCH and FETCH_M are implemented, an 
implementation will include two sets of state prefetch registers used by those 
instructions. The use of these registers is described in Section 4.11. These registers 
are not directly accessible by software and are listed for completeness. 
3.1.5.2 VAX Compatibility Register 


The VAX compatibility instructions RC and RS include the intr_flag register, as 
described in Section 4.12. 


3.2 Notation 


The notation used to describe the operation of each instruction is given as a sequence 
of control and assignment statements in an ALGOL-like syntax. 
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3.2.1 Operand Notation : 


Tables 3—1, 3—2, and 3-3 list the notation for the operands, the operand values, and 
the other expression operands. 


Table 3-1: 


Notation 


Ra 
Rb 
#b 
Re 
Fa 
Fb 
Fc 


Table 3-2: 
Notation 


Rav 
Rbv 


Fav 
Fbv 


Table 3-3: 
Notation 
IPR_x 


Operand Notation 
Meaning 
An integer register operand in the Ra field of the instruction. 
An integer register operand in the Rb field of the instruction. 
An integer literal operand in the Rb field of the instruction. 
An integer register operand in the Rc field of the instruction. 
A floating-point register operand in the Ra field of the instruction. 
A floating-point register operand in the Rb field of the instruction. 
A floating-point register operand in the Rc field of the instruction. 


Operand Value Notation 
Meaning 
The value of the Ra operand. This is the contents of register Ra. 


The value of the Rb operand. This could be the contents of register Rb, or a 
zero-extended 8-bit literal in the case of an Operate format instruction. 


The value of the floating point Fa operand. This is the contents of register Fa. 
The value of the floating point Fb operand. This is the contents of register Fb. 


Expression Operand Notation 
Meaning 


Contents of Internal Processor Register x 


IPR_SPimode] Contents of the per-mode stack pointer selected by mode 


PC 
Rn 
Fn 
X{m] 





Updated PC value 
Contents of integer register n 


Contents of floating-point register n 


Element m of array X 
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_ 3.2.2 Instruction Operand Notation 


The notation used to describe instruction operands follows from the operand specifier 
notation used in the VAX Architecture Standard. Instruction operands are described 
as follows: | 


<name>.<access type><data type> 


<name> 


Specifies the instruction field (Ra, Rb, Re, or disp) and register type of the operand 
(integer or floating). It can be one of the following: 


Name 
disp 
fne 
Ra 
Rb 

#b 

Re 

Fa 

Fb 

Fc 


Meaning 

The displacement field of the instruction. 

The PAL function field of the instruction. 

An integer register operand in the Ra field of the instruction. 

An integer register operand in the Rb field of the instruction. 

An integer literal operand in the Rb field of the instruction. 

An integer register operand in the Rc field of the instruction. 

A floating-point register operand in the Ra field of the instruction. 
A floating-point register operand in the Rb field of the instruction. 
A floating-point register operand in the Rc field of the instruction. 


_ «access type> 
Is a letter denoting the operand access type: 


Access Type Meaning 


a 


The operand is used in an address calculation to form an effective 
address. The data type code that follows indicates the units of 
addressability (or scale factor) applied to this operand when the 
instruction is decoded. 


For example: 


“al” means scale by 4 (longwords) to get byte units (used in branch 
displacements); “.ab” means the operand is already in byte units 
(used in load/store instructions). 


The operand is an immediate literal in the instruction. 
The operand is read only. 


The operand is both read and written. 
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Access Type Meaning 


Ww 


<data type> 


The operand is write only. 


Is a letter denoting the data type of the operand: 


Data Type 


Ke g@¢ toa Qoeneom & 


3.2.3 Operators 


Meaning 

Byte 

F_ floating 

G_floating 

Longword 

Quadword | 

IEEE single floating (S_floating) 

IEEE double floating (T_floating) 

Word 

The data type is specified by the instruction 


The operators shown in Table 3—4 are used: 


Table 3-4: Operators 


Operator 
! 


+ 


a 


*U 


> 


Meaning 


Comment delimiter 

Addition 

Subtraction 

Signed multiplication 

Unsigned multiplication 

Exponentiation (left argument raised to right argument) 
Division | 

Replacement 

Bit concatenation | 

Indicates explicit operator precedence 

Contents of memory location whose address is x 
Contents of bit field of x defined by bits n through m 
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Table 3-4 (Cont.): Operators 


Operator 
x<m> 


ACCESS(x,y) 


AND 
ARITH_RIGHT_SHIFT(x,y) 


BYTE_ZAP(x,y) 


CASE 


DIV 
LEFT_SHIFT(x,y) 


LOAD_LOCKED 


Ig 
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restricted 


Meaning 


M’th bit of x 


Accessibility of the location whose address is x using the 
access mode y. Returns a Boolean value TRUE if the address 
is accessible, else FALSE. 


| Logical product 


Arithmetic right shift of first operand by the second operand. 
Y is an unsigned shift value. Bit 63, the sign bit, is copied 
into vacated bit positions and shifted out bits are discarded. 


X is a quadword, y is an 8-bit vector in which each bit 
corresponds to a byte of the result. The y bit to x byte 
correspondence is y<n> «+ x<8n+7:8n>. This correspondence 
also exists between y and the result. 


For each bit of y from n = 0 to 7, if y <n> is 0 then byte <n> 


of x is copied to byte <n> of result, and if y <n> is 1 then byte 
<n> of result is forced to all zeros. 


The CASE construct selects one of several actions based on 
the value of its argument. The form of a case is: 


CASE argument OF 
argvaluel: action_l 
argvalue2: action _2 


argvaluen: action_n 
[otherwise: default action] 
ENDCASE 


If the value of argument is argvaluel then action_1 is 
executed; if argument = argvalue2, then action_2 is executed, 
and so forth. 


Once a single action is executed, the code stream breaks 
to the ENDCASE (there is an implicit break as in Pascal). 
Each action may nonetheless be a sequence of pseudocode 
operations, one operation per line. 


Optionally, the last argvalue may be the atom otherwise’. The 
associated default action will be taken if none of the other 
argvalues match the argument. 


Integer division (truncates) 
Logical left shift of first operand by the second operand. 


Y is an unsigned shift value. Zeros are moved into the vacated | 
bit positions, and shifted out bits are discarded. 


The processor records the target physical address in a per- 
processor locked_physical_address register and sets the per- 
processor lock_flag. 


Log to the base 2 





Distribution 


Table 3-4 (Cont.): Operators 


Operator 


NOT 

OR 

x MOD y 

Relational Operators 


MINU(x,y) 


PHYSICAL_ADDRESS 


PRIORITY_ENCODE 


RIGHT_SHIFT(x,y) 


SEXT(x) 
STORE_CONDITIONAL 


Meaning 


Logical (ones) complement 
Logical sum 


x modulo y 


Operator Meaning 


LT Less than signed 

LTU Less than unsigned 

LE Less or equal signed 

LEU Less or equal unsigned 
EQ Equal signed and unsigned 
NE Not equal signed and unsigned 
GE Greater or equal signed 
GEU Greater or equal unsigned 
GT Greater signed 

GTU Greater unsigned 

LBC Low bit clear 

LBS Low bit set 


Returns the smaller of x and y, with x and y interpreted as 
unsigned integers 


Translation of a virtual address 


Returns the bit position of most significant set bit, interpret- 
ing its argument as a positive integer ( = int( lg( x ) ) ). 


For example: 
priority encode( 255 ) = 7 


Logical right shift of first operand by the second operand. Y 
is an unsigned shift value. Zeros are moved into vacated bit 
positions, and shifted out bits are discarded. 


X is sign-extended to the required size. 


If the lock_flag is set, then do the indicated store and clear 
the lock_flag. 
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Table 3—4 (Cont.): Operators 
Operator | Meaning 
TEST(x,cond) The contents of register x are tested for branch condition 
: (cond) true. TEST returns a Boolean value TRUE if x bears 
the specified relation to 0, else FALSE is returned. Integer 


and floating test conditions are drawn from the preceding list 
of relational operators. 


XOR Logical difference 
ZEXT(x) X is zero-extended to the required size. 


3.2.4 Notation Conventions 
The following conventions are used: 
1. Only operands that appear on the left side of a replacement operator are modified. 


2. No operator precedence is assumed other than that replacement (—) has the 
lowest precedence. Explicit precedence is indicated by the use of “{}”. 


3. All arithmetic, logical, and relational operators are defined in the context of their 
operands. For example, “+” applied to G_floating operands means a G_floating 
add, whereas “+” applied to quadword operands is an integer add. Similarly, “LT” 
isaG floating comparison when applied to G_floating operands and an integer 
comparison when applied to quadword operands. 


3.3 Instruction Formats 


There are five basic Alpha instruction formats: 


e Memory 
e Branch 
e Operate 


e Floating-point Operate 
e PALcode 


_ All instruction formats are 32 bits long with a 6-bit major opcode field in bits <31:26> 
of the instruction. 


Any unused register field (Ra, Rb, Fa, Fb) of an instruction must be set to a value 
of 31. 


SOFTWARE NOTE 
There are several instructions, each formatted as a 
memory instruction, that do not use the Ra and/or Rb 
fields. These instructions are: Memorv Barrier, Fetch, 
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Fetch_M, Read Process Cycle Counter, Read and Clear, 
Read and Set, and Trap Barrier. 


3.3.1 Memory Instruction Format 


3.3.1.1 


The Memory format is used to transfer data between registers and memory, to 
load an effective address, and for subroutine jumps. It has the format shown in 
Figure 3—1. 


Figure 3-1: Memory Instruction Format 


31 26 25 2120 1615 


0 


A Memory format instruction contains a 6-bit opcode field, two 5-bit register address 
fields, Ra and Rb, and a 16-bit signed displacement field. 


The displacement field is a byte offset. It is sign-extended and added to the contents 
of register Rb to form a virtual address. Overflow is ignored in this calculation. 


The virtual address is used as a memory load/store address or a result value, 
depending on the specific instruction. The virtual address (va) is computed as follows 
for all memory format instructions except the load address high (LDAH): 


va + {Rbv + SEXT (Memory disp) } 

For LDAH the virtual address (va) is computed as follows: 
va + {Rbv + SEXT (Memory disp*65536) } 

Memory Format Instructions with a Function Code 


Memory format instructions with a function code replace the memory displacement 
field in the memory instruction format with a function code that designates a set of 
miscellaneous instructions. The format is shown in Figure 3-2. : 


Figure 3-2: Memory Instruction with Function Code Format 


31 26 25 2120 1615 


0 
pete me) | Fatin 


The memory instruction with function code format contains a 6-bit opcode field and 
a 16-bit function field. Unused function encodings produce UNPREDICTABLE but 
not UNDEFINED results; they are not security holes. 


There are two fields, Ra and Rb. The usage of those fields depends on the instruction. 
See Section 4.11. | 
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3.3.1.2 Memory Format Jump instructions 


For computed branch instructions (CALL, RET, JMP, JSR_COROUTINE) the 
displacement field is used to provide branch-prediction hints as described in — 
Section 4.3. 


3.3.2 Branch Instruction Format 


The Branch format is used for conditional branch instructions and for PC-relative 
subroutine jumps. It has the format shown in Figure 3-3. 


Figure 3-3: Branch Instruction Format 


31 26 25 2120 0 


A Branch format instruction contains a 6-bit opcode field, one 5-bit register address 
field (Ra), and a 21-bit signed displacement field. 


The displacement is treated as a longword offset. This means it is shifted left two bits. 
(to address a longword boundary), sign-extended to 64 bits and added to the updated 
PC to form the target virtual address. Overflow is ignored in this calculation. The 
target virtual address (va) is computed as follows: 


va -— PC + {4*SEXT (Branch disp) } 


3.3.3 Operate Instruction Format 


The Operate format is used for instructions that perform integer register to integer 
register operations. The Operate format allows the specification of one destination 
operand and two source operands. One of the source operands can be a literal 
constant. The Operate format in Figure 3-4 shows the two cases when bit <12> of 
the instruction is 0 and 1. 


Figure 3-4: Operate Instruction Format 


26 25 2120 161513142 11 


26 25 2120 1312 11 
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An Operate format instruction contains a 6-bit opcode field and a 7-bit function 
field. Unused function encodings produce UNPREDICTABLE but not UNDEFINED 
results; they are not security holes. 


There are three operand fields, Ra, Rb, and Rc. 


The Ra field specifies a source operand. Symbolically, the integer Rav operand is 
formed as follows: 


IF inst<25:21> EQ 31 THEN 
Rav «- 0 

ELSE 
Rav + Ra 

END 


The Rb field specifies a source operand. Integer operands can specify a literal or an 
integer register using bit <12> of the instruction. 


If bit <12> of the instruction is 0, the Rb field specifies a source register operand. 


If bit <12> of the instruction is 1, an 8-bit zero-extended literal constant is formed 
by bits <20:13> of the instruction. The literal is interpreted as a positive integer 
between 0 and 255 and is zero-extended to 64 bits. Symbolically, the integer Rbv 
operand is formed as follows: 


IF inst<12> EQ 1 THEN 
Rbv «+ ZEXT (inst<20:13>) 


ELSE 
IF inst<20:16> EQ 31 THEN 
Rbv + 0 
ELSE 
Rbv «+ Rb 
END 
END 


The Re field specifies a destination operand. 


3.3.4 Floating-Point Operate Instruction Format 


The Floating-point Operate format is used for instructions that perform floating- 
point register to floating-point register operations. The Floating-point Operate 
format allows the specification of one destination operand and two source operands. 
The Floating-point Operate format is shown in Figure 3—5. 


Figure 3-5: Floating-Point Operate Instruction Format 


26 25 2120 1615 4 0 


mle fe [mm [> 


A Floating-point Operate format instruction contains a 6-bit opcode field and an 11- 
_ bit function field. Unused function encodings produce UNPREDICTABLE results, 
as defined in Section 1.6.3. 
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There are three operand fields, Fa, Fb, and Fe. Each operand field specifies either 
an integer or floating-point operand as defined by the instruction. 


The Fa field specifies a source operand. Symbolically, the Fav operand is formed as 
follows: 


IF inst<25:21> EO 31 THEN 
Fav « 0 

ELSE 
Fav « Fa 

END 


The Fb field specifies a source operand. Symbolically, the Fbv operand is formed as 


follows: 
IF inst<20:16> EQ 31 THEN 
Fbv «< 0 
ELSE 
Fov <« Fb 
END 


NOTE | 
Neither Fa nor Fb can be a literal in Floating-point 
Operate instructions. | 


The Fc field specifies a destination operand. 
3.3.4.1 Floating-Point Convert Instructions 


Floating-point Convert instructions use a subset of the Floating-point Operate 
format and perform register-to-register conversion operations. The Fb operand 
specifies the source; the Fa field must be F31. 


The floating-point register to be used is specified by the Fa, Fb, and Fe fields all 
pointing to the same floating-point register. If the Fa, Fb, and Fc fields do not all 
point to the same floating-point register, then it is UNPREDICTABLE which register 
is used. 


3.3.5 PALcode Instruction Format 


The Privileged Architecture Library (PALcode) format is used to specify extended 
processor functions. It has the format shown in Figure 3-6. | 


Figure 3-6: PALcode Instruction Format 


31 26 25 0 


PALcode Function 


The 26-bit PALcode function field specifies the operation. 


The source and destination operands for PALcode instructions are su 
registers that are specified in the individual instruction descriptions. 
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An opcode of zero and a PALcode function of zero specify the HALT instruction. 
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3.4 \Revision History 

Revision 5.0, May 12, 1992 
Removed references to SP and PS 
Added unsigned multiplication operator 
Added description of Fa, Fb registers if unused 
Converted to SDML 
Added Memory Format with Function Code section 
Moved Instruction Operand section from Chapter 4 
Edited description of R31 


Separated operand notation from operand value notation and simplified language 


Se aS Ss ee YS 


Added comment and note to section 3.3 which specifies value assigned to unused 
register fields of instructions 


Revision 4.0, March 29, 1991 

Typos 

Upgrade description of R30 and implicit stack behavior of HW/PALcode 
Upgrade definition of byte_zap, access, left_shift, and right_shift operators 
Add definition of single bit field select operator, <n> 

Rename arith_shift operator to arith_right_shift and upgrade definition 
Make test a dyadic operator with explicit condition argument 

Define the CASE pseudocode construct 


Include Processor Status register in description of Alpha registers 


Oo SB St a Oe SN OS 


Add definitions of priority_encode and exponentiation (**) operators 

10. Changed text describing R30 

11. Changed two relational operator mnemonics 

Revision 3.0, March 2, 1990 

1. Under registers, add lock registers, IPRs, and optional registers 

2. Define DIV, BYTE_ZAP, and PHYSICAL_ADDRESS; delete BYTE_SEL 
3. Delete reference to R28 

Revision 2.0, October 4, 1989 

1. Add comment to section on PC that PC is not an Integer Register 

2. Add comment that SP is R30 
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3. Change description of ‘L field in operate Instruction format 
Revision 1.0, May 23, 1989 

1. Remove Rb reading as PC for Rb eq 0 

2. Fix error in which bit is literal enable bit for operate format 
3. Add Floating-point Operate format 


Revision 0.0, March 15, 1989 


1. Initial version 
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Chapter 4 
Instruction Descriptions (I) 


4.1 Instruction Set Overview 


This chapter describes the instructions implemented by the Alpha architecture. The 
instruction set is divided into the following sections: _ 


Instruction Type Section 


Integer load and store 4.2 
Integer control 4.3 
Integer arithmetic 4.4 
Logical and shift 4.5 
Byte manipulation 4.6 
Floating-point load and store 4.8 
Floating-point control 4.9 
Floating-point operate 4.10 
Miscellaneous 4.11 


Within each major section, closely related instructions are combined into groups and 
described together. The instruction group description is composed of the following: 


e The group name 


e The format of each instruction in the group, which includes the name, access 
type, and data type of each instruction operand 


¢ The operation of the instruction 

e Exceptions specific to the instruction 

¢ The instruction mnemonic and name of each instruction in the group 
© Qualifiers specific to the instructions in the group 

e A description of the instruction operation 


e Optional programming examples and optional notes on the instruction 
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4.1.1 Subsetting Rules 


An instruction that is omitted in a subset implementation of the Alpha litle 
is not performed in either hardware or PALcode. System software may provide 
emulation routines for subsetted instructions. 


4.1.1.1 Floating-Point Subsets 


Floating-point support is optional on an Alpha processor. An implementation that 
supports floating-point must implement the 32 floating-point registers, the Floating- 
point Control Register (FPCR) and the instructions to access it, floating-point — 
branch instructions, floating-point copy sign (CPYSx) instructions, floating-point - 
convert instructions, floating-point conditional move instruction (FCMOV), and the 
S_floating and T_floating memory operations. 


SOFTWARE NOTE 

A system that will not support floating-point operations 
is still required to provide the 32 floating-point 
registers, the Floating-point Control Register (FPCR) 
and the instructions to access it, and the T_floating 
memory operations if the system intends to support the 
OpenVMS Alpha operating system. This requirement 
facilitates the implementation of a floating-point 
emulator and simplifies context-switching. 


In addition, floating-point support requires at least one of the following subset 
groups: | 


1. VAX Floating-point Operate and Memory instructions (F_ and G._ floating). 


2. IEEE Floating-point Operate instructions (S_ and T_floating). Within this group, 
an implementation can choose to include or omit separately the ability to perform 
IEEE rounding to plus infinity and minus infinity. 


Note: if one instruction in a group is provided, all other instructions in that group 
must be provided. An implementation with full floating-point support includes 
both groups; a subset floating-point implementation supports only one of these 
groups. The individual instruction descriptions indicate whether an instruction can 
be subsetted. 


4.1.2 Software Emulation Rules 


General-purpose layered and application software that executes in User mode may 
assume that certain loads (LDL, LDQ, LDF, LDG, LDS, and LDT) and certain stores 
(STL, STQ, STF, STG, STL and STT) of unaligned data are emulated by system 
software. General-purpose layered and application software that executes in User 
mode may assume that subsetted instructions are emulated by system software. 
Frequent use of emulation may be significantly slower than using alternative code 
sequences. 


Emulation of loads and stores of unaligned data and subsetted instructions need ~ 
not be provided in privileged access modes. System software that supports special- 
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purpose dedicated applications need not provide emulation in User mode if emulation 
is not needed for correct execution of the special-purpose applications. 


4.1.3 Opcode Qualifiers 


Some Operate format and Floating-point Operate format instructions have several 
variants. For example, for the VAX formats, Add F_floating (ADDF) is supported 
with and without floating underflow enabled, and with either chopped or VAX 
rounding. For IEEE formats, IEEE unbiased rounding, chopped, round toward plus 
infinity, and round toward minus infinity can be selected. 


The different variants of such instructions are denoted by opcode qualifiers, which 
consist of a slash (/) followed by a string of selected qualifiers. Each qualifier is 
denoted by a single character as shown in Table 4—1. The opcodes for each qualifier 
are listed in Appendix C. 
Table 4-1: Opcode Qualifiers 
Qualifier Meaning 

Chopped rounding 

Rounding mode dynamic 


Round toward minus infinity 


Software completion enable 


C 
D 
M 
I Inexact result enable 
S 
U Floating underflow enable 
V 


Integer overflow enable 


The default values are normal rounding, software completion disabled, inexact result 
disabled, floating underflow disabled, and integer overflow disabled. 
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4.2 Memory Integer Load/Store Instructions 


The instructions in this section move data between the integer registers and memory. 


They use the Memory instruction format. The instructions are summarized in 
Table 4—2. | 


Table 4-2: Memory Integer Load/Store Instructions 
Mnemonic Operation | 

LDA Load Address 

LDAH Load Address High 


~ LDL Load Sign-Extended Longword 
LDL_L Load Sign-Extended Longword Locked 
LDQ | Load Quadword 


LDQ_L Load Quadword Locked 
LDQ_U Load Quadword Unaligned 


STL Store Longword 
STL_C Store Longword Conditional 
STQ Store Quadword 


STQ_C Store Quadword Conditional 
STQ_U Store Quadword Unaligned 
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4.2.1 Load Address 


Format: 


LDAx Ra.wq,disp.ab(Rb.ab) 


Operation: 


Ra «+ Rbv + SEXT (disp) 
Ra «< Rbv + SEXT(disp*65536) 


Exceptions: 


None 


Instruction mnemonics: 


LDA Load Address 


LDAH Load Address High ~ 


Qualifiers: 


None 


Description: 


{Memory format 


'LDA 
! LDAH 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement for LDA, and 65536 times the sign-extended 16-bit displacement for 
LDAH. The 64-bit result is written to register Ra. 








Distril 
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4.2.2 Load Memory Data into Integer Register 
Format: 
LDx Ra.wa,disp.ab(Rb.ab) -~—=—‘!Memory format 
Operation: 


va + {Rbv + SEXT (disp) } | 
Ra + SEXT( (va) <31:0>) {LDL 


Ra + £4(va)<63:0> !'LDO 
Exceptions: 


Access Violation 
Alignment 

Fault on Read 
Translation Not Valid 


Instruction mnemonics: 


LDL Load Sign-Extended Longword from Memory to Register 


LDQ Load Quadword from Memory to Register 
Qualifiers: 

None 
Description: 


The virtual address is computed by adding register Rb to the sign-extended 16- 
bit displacement. The source operand is fetched from memory, sign-extended, and 
written to register Ra. If the data is not naturally aligned, an alignment exception 
is generated. 


4-6 Common Architecture (I) _ 





on 


Digital Res d Distribut 


4.2.3 Load Unaligned Memory Data into Integer Register 


Format: 


-LDQ_U ___ Ra.wa,disp.ab(Rb.ab) !Memory format 


Operation: 


va «+ {{Rbv + SEXT(disp)} AND NOT 7} 
Ra + £(va)<63:0> 


Exceptions: 


Access Violation 
Fault on Read 
Translation Not Valid 


Instruction mnemonics: 


LDQ U Load Unaligned Quadword from Memory to Register 


Qualifiers: 
None 


Description: 


The virtual address is computed by adding register Rb to the sign-extended 16- 
bit displacement, then the low-order three bits are cleared. The source operand is 
fetched from memory and written to register Ra. 
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4.2.4 Load Memory Data into Integer Register Locked 


Format: 


LDx_L Ra.wq,disp.ab(Rb.ab) !Memory format 


Operation: 


va + {Rbv + SEXT (disp) } 


lock_flag + 1 
locked_physical address +- PHYSICAL ADDRESS (va) 


Ra «+ £SEXT( (va) <31:0>) {LDL L 
Ra + #£4(va)<63:0> {LDQ L 
Exceptions: 


Access Violation 
Alignment 

Fault on Read 
Translation Not Valid 


Instruction mnemonics: 


LDL_L Load Sign-Extended Longword from Memory to Register Locked 
LDQ _L Load Quadword from Memory to Register Locked 


Qualifiers: 


None 


Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The source operand is fetched from memory, sign-extended for LDL_ 
L, and written to register Ra. 


When a LDx_L instruction is executed without faulting, the processor records the 
target physical address in a per-processor locked alii address eee and sets 
the per-processor lock_flag. 


If the per-processor lock_flag is (still) set when a STx_C instruction is executed, the 
store occurs; otherwise, it does not occur, as described for the STx_C instructions. 


if processor A's lock_flag is set and processor B successfully does a store within A’s 
locked range of physical addresses, then A’s lock_flag is cleared. A processor’s locked 
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range is the aligned block of 2**N bytes that includes the locked_physical_address. 
The 2**N value is implementation dependent. It is at least 8 (minimum lock range 
is an aligned quadword) and is at most the page size for that implementation 
(maximum lock range is one physical page). 


A processor’s lock_flag is also cleared if that processor encounters a CALL_PAL REI 
instruction. It is UNPREDICTABLE whether or not a processor’s lock_flag is cleared 
on any other CALL_PAL instruction. It is UNPREDICTABLE whether a processor’s 
lock_flag is cleared by that processor’s executing a normal load or store instruction. 
It is UNPREDICTABLE whether a processor’s lock_flag is cleared by that processor’s 
executing a taken branch (including BR, BSR, and Jumps); conditional branches that 
fall through do not clear the lock_flag. 


The sequence LDx_L, modify, STx_C, BEQ xxx executed on a given processor does an 
atomic read-modify-write of a datum in shared memory if the branch falls through; 
if the branch is taken, the store did not modify memory and the sequence may be 
repeated until it succeeds. 


Notes: 


¢ LDx_L instructions do not check for write access; hence a matching STx_C may 
take an access-violation or fault-on-write exception. 


Executing a LDx_L instruction on one processor does not affect any 
architecturally visible state on another processor, and in particular cannot cause 
a STx_C on another processor to fail. 


LDx_L and STx_C instructions need not be paired. In particular, an LDx_L may 
be followed by a conditional branch: on the fall-through path an STx_C is done, 
whereas on the taken path no matching STx_C is done. 


If two LDx_L instructions execute with no intervening STx_C, the second one 
overwrites the state of the first one. If two STx_C instructions execute with no 
intervening LDx_L, the second one always fails because the first clears lock_flag. 


° Software will not emulate unaligned LDx_L instructions. 


e If any other memory access (LDx, LDQ_U, STx, STQ_U) is done on the given > 
processor between the LDx_L and the STx_C, the sequence above may always 
fail on some implementations; hence, no useful program should do this. 


¢ Ifa branch is taken between the LDx_L and the STx_C, the sequence above may 
always fail on some implementations; hence, no useful program should do this. 
(CMOVxx may be used to avoid branching.) — 


e Ifasubsetted instruction (for example, floating-point) is done between the LDx_L 
and the STx_C, the sequence above may always fail on some implementations, 
because of the Illegal Instruction Trap; hence, no useful program should do this. 


e Ifa large number of instructions are executed between the LDx_L and the STx_C, 
the sequence above may always fail on some implementations, because of a timer 
interrupt always clearing the lock_flag before the sequence completes; hence, no 
useful program should do this. | 
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Restricted 





e Hardware implementations are encouraged to lock no more than 128 bytes. 

Software implementations are encouraged to separate locked locations by at 

least 128 bytes from other locations that could aa be written by another 
processor while the first location is locked. 


IMPLEMENTATION NOTES 
Implementations that impede the mobility of a cache 
block on LDx_L, such as that which may occur in a Read 
for Ownership cache coherency protocol, may release the 
cache block and make the subsequent STx_C fail if a 
branch-taken or memory instruction is executed on that 
processor. 


All implementations should guarantee that at least 
40 non-subsetted operate instructions can be executed 
between timer interrupts. 
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4.2.5 Store Integer Register Data into Memory Conditional 
Format: | 
STx_C Ra.mq,disp.ab(Rb.ab) -!Memory format 


Operation: 


va + {Rbv + SEXT (disp) } 


IF lock _flag EQ 1 THEN 
(va) <31:0> -— Rav<31:0> ISTL_C 
(va) «< Rav 'STQ C 
Ra + lock _flag 
lock_flag + 0 


Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


instruction mnemonics: 


STL_C Store Longword from Register to Memory Conditional 
STQ_C Store Quadword from Register to Memory Conditional 


Qualifiers: 


None 


Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. If the lock_flag is set, the Ra operand is written to memory at this 
address. (See the LDx_L description for conditions that clear the lock_flag.) The 
lock_flag is returned in RA and then set to a zero. 


Notes: 
e Software will not emulate unaligned STx_C instructions. 


e Each implementation must do the test and store atomically, so that if two 
processors execute store conditionals within the same lock range, exactly one 
of the stores succeeds. 
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restricted 


¢ The following sequence should not be used: 


try again: LDQ L Rijs 
7 <modify R1> 
STO C Rl, x 
BEQ Rl, try_again 


That sequence penalizes performance when the STQ_C succeeds, because the 
sequence contains a backward branch, which is predicted to be taken in the 
Alpha architecture. In the case where the STQ_C succeeds and the branch 
will actually fall through, that sequence incurs unnecessary delay due to a 
mispredicted backward branch. Instead, a forward branch should be used to 
handle the failure case as shown in Section 5.5.2. 


SOFTWARE NOTE 
The address specified by a STx_C instruction need not 
match that given in a preceding LDx_L. Specifying 
unmatched addresses for those instructions requires an 
MB in between to guarantee ordering. 


IMPLEMENTATION NOTES 
A STx_C must propagate to the point of coherency, 
where it is guaranteed to prevent any other store from 
changing the state of the lock bit, before its outcome can 
be determined. 


If an implementation could encounter a TB or cache miss 
on the data reference of the STx_C in the sequence above 
(as might occur in some shared I- and D-stream direct- 
mapped TBs/caches), it must be able to resolve the miss 
and complete the store without always failing. 
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4.2.6 Store Integer Register Data into Memory 


Format: 


STx _Ra.rq,disp.ab(Rb.ab) {Memory format 


Operation: 
va + {Rbv + SEXT (disp) } 


(va) <31:0> «+ Rav<31:0> 'STL 
(va) «- Rav 'SsTO 


Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


STL Store Longword from Register to Memory 
STQ Store Quadword from Register to Memory 
Qualifiers: 
None 
Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The Ra operand is written to memory at this address. If the data is 
not naturally aligned, an alignment exception is generated. 
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4.2.7 Store Unaligned Integer Register Data into Memory 


Format: 


~STQ_.U — Ra.rq,disp.ab(Rb.ab) _ {Memory format 


Operation: 


va «+ {{Rbv + SEXT(disp)} AND NOT 7} 
(va)<63:0> — Rav<63:0> 


Exceptions: 


Access Violation 
Fault on Write 
Translation Not Valid 


Instruction mnemonics: 


STQ_U Store Unaligned Quadword from Register to Memory 


Qualifiers: 
None 


Description: 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement, then clearing the low order three bits. The Ra operand is written to 
memory at this address. | 
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4.3 Control instructions 


Alpha provides integer conditional branch, unconditional branch, branch to 
subroutine, and jump instructions. The PC used in these instructions is the updated 
PC, as described in Section 3.1.1. 


To allow implementations to achieve high performance, the Alpha architecture 
includes explicit hints based on a branch-prediction model: 


1. For many implementations of computed branches (JSR/RET/JMP), there is a 
substantial performance gain in forming a good guess of the expected target I- 
cache address before register Rb is accessed. 


2. For many implementations, the first-level (or only) I-cache is no bigger than a 
page (8 KB to 64 KB). | 


3. Correctly predicting subroutine returns is important for good performance. Some 
implementations will therefore keep a small stack of predicted subroutine return 
I-cache addresses. 


The Alpha architecture provides three kinds of branch-prediction hints: likely target 
address, return-address stack action, and conditional branch-taken. 


For computed branches, the otherwise unused displacement field contains a function 
code (JMP/JSR/RET/JSR_COROUTINE), and, for JSR and JMP, a field that 
statically specifies the 16 low bits of the most likely target address. The PC- 
relative calculation using these bits can be exactly the PC-relative calculation used 
in unconditional branches. The low 16 bits are enough to specify an I-cache block 
within the largest possible Alpha page and hence are expected to be enough for 
branch-prediction logic to start an early I-cache access for the most likely target. 


For all branches, hint or opcode bits are used to distinguish simple branches, 
subroutine calls, subroutine returns, and coroutine links. These distinctions allow 
branch-predict logic to maintain an accurate stack of predicted return addresses. 


For conditional branches, the sign of the target displacement is used as a taken 
/fall-through hint. The instructions are summarized in Table 4-3. 
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Table 4~3: Control Instructions Summary 


Mnemonic Operation 

BEQ Branch if Register Equal to Zero 

BGE Branch if Register Greater Than or Equal to Zero 
BGT Branch if Register Greater Than Zero 

BLBC | Branch if Register Low Bit Is Clear 

BLBS | Branch if Register Low Bit Is Set 

BLE Branch if Register Less Than or Equal to Zero 
BLT | Branch if Register Less Than Zero 

BNE | Branch if Register Not Equal to Zero 

BR Unconditional Branch 

BSR | Branch to Subroutine 

JMP Jump 

JSR Jump to Subroutine 

RET Return from Subroutine 


JSR_COROUTINE Jump to Subroutine Return 
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4.3.1 Conditional Branch 


Format: 
Bxx Ra.rq,disp.al {Branch format 
Operation: 


{update PC} 
va + PC + {4*SEXT (disp) } 
IF TEST(Rav, Condition_based_on_Opcode) THEN 
PC + va 


Exceptions: 


None 


instruction mnemonics: 


BEQ Branch if Register Equal to Zero 
BGE Branch if Register Greater Than or Equal to Zero 
BGT Branch if Register Greater Than Zero 


BLBC Branch if Register Low Bit Is Clear 
BLBS Branch if Register Low Bit Is Set 


BLE Branch if Register Less Than or Equal to Zero 
BLT — Branch if Register Less Than Zero 
BNE Branch if Register Not Equal to Zero 
Qualifiers: 
None 
Description: 


Register Ra is tested. If the specified relationship is true, the PC is loaded with 
the target virtual address; otherwise, execution continues with the next sequential 
instruction. | | 


The displacement is treated as a signed longword offset. This means it is shifted 
left two bits (to address a longword boundary), sign-extended to 64 bits, and added 
to the updated PC to form the target virtual address. 


The conditional branch instructions are PC-relative only. The 21-bit signed 
displacement gives a forward/backward branch distance of +/— 1M instructions. 
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The test is on the signed quadword intever interpretation of the eee contents; 
all 64 bits are tested. 


Notes: 


Forward conditional branches (positive pea eiadda are predicted to fall 
through. Backward conditional branches (negative displacement) are predicted 
to be taken. Conditional branches do not affect a predicted return address stack. 
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4.3.2 Unconditional Branch 


Format: 
BxR Ra.wgq,disp.al 'Branch format 


Operation: 


{update PC} 
Ra — PC . 
PC + PC + {4*SEXT (disp) } 


Exceptions: 


None 


Instruction mnemonics: 


BR Unconditional Branch 
BSR Branch to Subroutine 
Qualifiers: | 
| None 
Description: 


The PC of the following instruction (the updated PC) is written to register Ra, and 
then the PC is loaded with the target address. 


The displacement is treated as a signed longword offset. This means it is shifted 
| left two bits (to address a longword boundary), sign-extended to 64 bits, and added 
_ to the updated PC to form the target virtual address. 


/ The unconditional branch instructions are PC-relative. The 21-bit signed 
displacement gives a forward/backward branch distance of +/— 1M instructions. 


PC-relative addressability can be established by: 
BR Rx,Lil 
Li: 
Notes: 


¢ BR and BSR do identical operations. They only differ in hints to possible branch- 
prediction logic. BSR is predicted as a subroutine call (pushes the return address 
on a branch-prediction stack), whereas BR is predicted as a branch (no push). 
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4.3.3 Jumps 


4-20 


Format: 
mnemonic Ra.wq,(Rb.ab),hint {Memory format | 
Operation: 


{update PC} | 
va «+ Rbv AND {NOT 3} 
Ra + PC | 
PC +- va 


Exceptions: 


None 


instruction mnemonics: 


JMP : Jump 
JSR Jump to Subroutine 
RET Return from Subroutine 


-JSR_COROUTINE Jump to Subroutine Return 


Qualifiers: 


None 


Description: 


The PC of the instruction following the Jump instruction (the updated PC) is written 


to register Ra, and then the PC is loaded with the target virtual address. 


The new PC is supplied from register Rb. The low two bits of Rb are ignored. Ra 
and Rb may specify the same register; the target calculation using the old value is 
done before the new value is assigned. 


All J ump instructions do identical operations. They only differ in hints to possible 
branch-prediction logic. The displacement field of the instruction is used to pass this 
information. The four different “opcodes” set different bit patterns in disp<15:14>, 
and the hint operand sets disp<13:0>. 
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These bits are intended to be used as shown in Table 44. 


Table 4-4: Jump Instructions Branch Prediction 


Predicted Prediction 
_ disp<15:14> Meaning Target<15:0> - Stack Action 
00 JMP «PC + {4*disp<13:0>} = 
01 JSR PC + {4*disp<13:0>} © | Push PC 
10 RET Prediction stack _ Pop 
11 JSR_COROUTINE Prediction stack Pop, push PC 


The design in Table 4—4 allows specification of the low 16 bits of a likely longword 
target address (enough bits to start a useful I-cache access early), and also allows 
distinguishing call from return (and from the other two less frequent operations). 


Note that the above information is used only as a hint; correct setting of these bits 
can improve performance but is not needed for correct operation. See Appendix A 
for more information on branch prediction. 


An unconditional long jump can be — by: 
OMP R31, (Rb),hint | 


Coroutine linkage can be performed by specifying the same register in both the Ra 
and Rb operands. When disp<15:14> equals ‘10’ (RET) or ‘11’ (JSR_COROUTINE) 
(that is, the target address prediction, if any, would come from a predictor 
implementation stack), then bits <13:0> are reserved for software and must be 
ignored by all implementations. All encodings for bits <13:0> are used by Digital 
software or Reserved to Digital, as ones 


Encoding Meaning © 


0000i¢ Indicates non-procedure return 
0001i¢ Indicates procedure return 
All other encodings are reserved to Digital. 
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4.4 Integer Arithmetic Instructions 


The integer arithmetic instructions ponoms add, subtract, multiply, and signed and 
unsigned compare operations. | 


The integer instructions are summarized in Table 4—5. 


Table 4—5: 


Integer Arithmetic Instructions Summary 


Mnemonic Operation 


ADD 
S4ADD 
S8ADD 


CMPEQ 
CMPLT 
CMPLE 


CMPULT 
CMPULE 


MUL 
UMULH 


SUB 
S4SUB 
S8SUB 


Add Quadword/Longword 
Scaled Add by 4 
Scaled Add by 8 


Compare Signed Quadword Equal 
Compare Signed Quadword Less Than 
Compare Signed Quadword Less Than or Equal 


Compare Unsigned Quadword Less Than 
Compare Unsigned Quadword Less Than or Equal 


Multiply Quadword/Longword 
Multiply Quadword Unsigned High 


Subtract Quadword/Longword 
Scaled Subtract by 4 
Scaled Subtract by 8 


There is no integer divide instruction. Division by a constant can be done via 
UMULH,;; division by a variable can be done via a subroutine. See Appendix A. 
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4.4.1 Longword Add 


Format: | 
ADDL Ra.rg,Rb.rg,Re.wq !Operate format 
ADDL Ra.rq,#b.ib,Re.wq !Operate format 
Operation: 


Re «+ SEXT( (Rav + Rbv)<31:0>) 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


ADDL Add Longword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Ra is added to register Rb or a literal, and the sign-extended 32-bit sum is 
written to Re. 


The high order 32 bits of Ra and Rb are ignored. Re is a proper sign extension 
of the truncated 32-bit sum. Overflow detection is based on the longword 
sum Rav<31:0> + Rbv<31:0>. 
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4.4.2 Scaled Longword Add 


Format: 
SxADDL Ra.rq,Rb.rq,Re.wq {Operate format 
SxADDL  Ra.ra,#b.ib,Re.wq !Operate format 
Operation: 
CASE 


“S4ADDL: Re «~ SEXT (((LEFT SHIFT(Rav,2)) + Rbv)<31:0>) 
S8ADDL: Re + SEXT (((LEFT_SHIFT(Rav,3)) + Rbv)<31:0>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


S4ADDL_ Scaled Add Longword by 4 
S8ADDL Scaled Add Longword by 8 


Qualifiers: 


None 


Description: 


Register Ra is scaled by 4 (for SEADDL) or 8 (for SSADDL) and is added to register 
Rb or a literal, and the sign-extended 32-bit sum is written to Re. 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the | 
truncated 32-bit sum. | 


ae 
. = 
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4.4.3 Quadword Add 


Format: 
ADDQ Ra.rq,Rb.rq,Re.wq {Operate format 
ADDQ Ra.rq,#b.ib,Re.wq _ !Operate format 


Operation: 


Re + Rav + Rbv 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


ADDQ Add Quadword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Ra is added to register Rb or a literal, and the 64-bit sum is written to Re. 


On overflow, the least significant 64 bits of the true result are written to the 
destination register. 


The unsigned compare instructions can be used to generate carry. After adding two 
values, if the sum is less unsigned than either one of the inputs, there was a carry 
out of the most significant bit. 
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4.4.4 Scaled Quadword Add 


‘Format: 
SxADDQ_ Ra.rq,Rb.rgq,Re.wq {Operate format 
SxADDQ  Ra.rq,#b.ib,Re.wq !Operate format 
Operation: 
CASE 


S4ADDQ: Re + LEFT SHIFT (Rav, 2) + Rbv 
S8ADDQ: Re + LEFT SHIFT (Rav,3) + Rbv 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


S4ADDQ Scaled Add Quadword by 4 
S8ADDQ_ Scaled Add Quadword by 8 


Qualifiers: 


None 


Description: 


Register Ra is scaled by 4 (for S4ADDQ) or 8 (for SSADDQ) and is added to register 
Rb or a literal, and the 64-bit sum is written to Re. 


On overflow, the least significant 64 bits of the true result are written to the 
destination register. | 
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4.4.5 Integer Signed Compare 


Format: 
CMPxx  Ra.rq,Rb.rq,Re.wq !Operate format 
CMPxx  Ra.rq,#b.ib,Re.wq !Operate format 
Operation: 
IF Rav SIGNED RELATION Rbv THEN 
Re + 1 
ELSE 
Ro «+ 0 
Exceptions: 
None 


Instruction mnemonics: 


CMPEQ Compare Signed Quadword Equal 
CMPLE Compare Signed Quadword Less Than or Equal 
CMPLT Compare Signed Quadword Less Than 


Qualifiers: 


None 


Description: 


Register Ra is compared to Register Rb or a literal. If the specified relationship is 
true, the value one is written to register Rc; otherwise, zero is written to Re. 


Notes: 


e Compare Less Than A,B is the same as Compare Greater Than B,A; Compare 
Less Than or Equal A,B is the same as Compare Greater Than or Equal B,A. 
Therefore, only the less-than operations are included. 
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4.4.6 Integer Unsigned Compare 


Format: 


CMPUxx Ra.rq,Rb.rq,Re.wq | !Operate format 
CMPUxx Ra.ra,#b.ib,Rc.wq !Operate format 


Operation: 


IF Rav UNSIGNED RELATION Rbv THEN 
Re = 1 

ELSE 
Re + 0 


Exceptions: 


None 


Instruction mnemonics: 


CMPULE Compare Unsigned Quadword Less Than or Equal 
CMPULT Compare Unsigned Quadword Less Than 


Qualifiers: 
None 
Description: 


Register Ra is compared to Register Rb or a literal. If the specified relationship is 
true, the value one is written to register Rc; otherwise, zero is written to Re. 
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4.4.7 Longword Multiply 


Format: 
~ MULL Ra.rq,Rb.rgq,Re.wq !Operate format 
MULL Ra.Rq,#b.ib,Re.wq | !Operate format 
Operation: 


Re «+ SEXT ((Rav * Rbv)<31:0>) 


Exceptions: 


Integer Overflow 


instruction mnemonics: 


MULL Multiply Longword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Ra is multiplied by register Rb or a literal, and the sign-extended 32-bit 
product is written to Re. © 


The high 32 bits of Ra and Rb are ignored. Re is a proper sign extension 
of the truncated 32-bit product. Overflow detection is based on the longword 
product Rav<31:0> * Rbv<31:0>. On overflow, the proper sign extension of the least 
significant 32 bits of the true result are written to the destination register. 


The MULQ instruction can be used to return the full 64-bit product. 
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4.4.8 Quadword Multiply 


Format: 


MULQ Ra.rq,Rb.rq,Re.wq 
MULQ Ra.Rq,#b.ib,Re.wq 


Operation: 


Re + Rav * Rbv 
Exceptions: 
Integer Overflow 


Instruction mnemonics: 


MULQ Multiply Quadword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Ra is multiplied by register Rb or a literal, and the 64-bit product is written 
to register Re. Overflow detection is based on considering the operands and the result 
as signed quantities. On overflow, the least significant 64 bits of the true result are 


written to the destination register. 


The UMULH instruction can be used to generate the upper 64 bits of the 128-bit 


result when an overflow occurs. 
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!Operate format . 


{Operate format 


4.4.9 Unsigned Quadword Multiply High 


Format: 
UMULH_ Ra.rq,Rb.rq,Rewq {Operate format 
UMULH  Ra.Rgq,#b.ib,Re.wq {Operate format 
Operation: 


Re + {Rav *U Rbv}<127:64> 


Exceptions: 


None 


Instruction mnemonics: 


UMULH Unsigned Multiply Quadword High 


Qualifiers: 


None 


Description: 


Register Ra and Rb or a literal are multiplied as unsigned numbers to produce a 
128-bit result. The high-order 64-bits are written to register Re. 


The UMULH instruction can be used to generate the upper 64 bits of a 128-bit result 
as follows: : 


Ra and Rb are unsigned: result of DMULH | 
Ra and Rb are signed: (result of UMULH) — Ra<63>*Rb — Rb<63>*Ra 


The MULQ instruction gives the low 64 bits of the result in either case. 
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4.4.10 Longword Subtract 


Format: 
SUBL Ra.rq,Rb.rq,Re.wq | !Operate format 
SUBL Ra.rq,#b.ib,Re.wq !Operate format 
Operation: 


Re «+ SEXT ((Rav - Rbv)<31:0>) 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


SUBL Subtract Longword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Rb or a literal is subtracted from register Ra, and the sign-extended 32-bit 
difference is written to Re. 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the 
truncated 32-bit difference. Overflow detection is based on the longword difference 
Rav<31:0> — Rbv<31:0>. 
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4.4.11 Scaled Longword Subtract 


Format: 


SxSUBL _ Ra.rq,Rb.rq,Re.wq !Operate format 
SxSUBL  Ra.rq,#b.ib,Re.wq !Operate format 


Operation: 


CASE 
S4SUBL: Ro «+ SEXT (((LEFT SHIFT(Rav,2)) - Rbv)<31:0>) 
S8SUBL: Rc «+ SEXT (((LEFT SHIFT (Rav,3)) - Rbv)<31:0>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


S4SUBL Scaled Subtract Longword by 4 
S8SUBL Scaled Subtract Longword by 8 


Qualifiers: 


None 


Description: 


Register Rb or a literal is subtracted from the scaled value of register Ra, which is 
scaled by 4 (for S4SUBL) or 8 (for SSSUBL), and the sign-extended 32-bit difference 
is written to Re. 


The high 32 bits of Ra and Rb are ignored. Rc is a proper sign extension of the 
truncated 32-bit difference. 
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4.4.12 Quadword Subtract | 


Format: 
SUBQ Ra.rq,Rb.rg,Re.wq !Operate format 
SUBQ Ra.rq,#b.ib,Rce.wq !Operate format 
Operation: 


Rc «+ Rav - Rbv 


Exceptions: 


Integer Overflow 


Instruction mnemonics: 


SUBQ Subtract Quadword 


Qualifiers: 


Integer Overflow Enable (/V) 


Description: 


Register Rb or a literal is subtracted from register Ra, and the 64-bit difference is 
written to register Rc. On overflow, the least significant 64 bits of the true result 
are written to the destination register. 


The unsigned compare instructions can be used to generate borrow. If the minuend 
(Rav) is less unsigned than the subtrahend (Rbv), there will be a borrow. 
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4.4.13 Scaled Quadword Subtract | 


Format: 


SxSUBQ  Ra.rq,Rb.rq,Re.wq !Operate format 
SxSUBQ  Ra.rq,#b.ib,Re.wq !Operate format 


Operation: 


CASE 
S4SUBQ: Re + LEFT SHIFT(Rav,2) - Rbv 
S8SUBQ: Rc «+ LEFT SHIFT(Rav,3) - Rbv 
ENDCASE 


Exceptions: 
None 


Instruction mnemonics: 


S4SUBQ Scaled Subtract Quadword by 4 
S8SUBQ Scaled Subtract Quadword by 8 


Qualifiers: 


None 


Description: 7 


Register Rb or a literal is subtracted from the scaled value of register Ra, which is 
scaled by 4 (for S4SUBQ) or 8 (for SSSUBQ), and the 64-bit difference is written to 
Re. | 
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4.5 Logical and Shift Instructions 


The logical instructions perform quadword Boolean operations. The conditional move 
integer instructions perform conditionals without a branch. The shift instructions 
perform left and right logical shift and right arithmetic shift. These are summarized 


in Table 46. 


Table 4-6: Logical and Shift Instructions Summary 


Mnemonic 
AND 

BIC 

BIS 

EQV 
ORNOT 
XOR 


CMOVxx 
SLL 


SRA 
SRL 


Operation 


Logical Product 

Logical Product with Complement 
Logical Sum (OR) | 

Logical Equivalence (XORNOT) 
Logical Sum with Complement 
Logical Difference 


Conditional Move Integer 


Shift Left Logical 
Shift Right Arithmetic 
Shift Right Logical 


SOFTWARE NOTE 
There is no arithmetic left shift instruction. Where an 
arithmetic left shift would be used, a logical shift will 
do. For multiplying by a small power of two in address 
computations, logical left shift is acceptable. 


Integer multiply should be used to perform an arithmetic left shift with overflow 


checking. 


Bit field extracts can be done with two logical shifts. Sign extension can be done 
with left logical shift and a right arithmetic shift. 
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4.5.1 Logical Functions 


Format: 


mnemonic  Ra.rq,Rb.rq,Re.wq 


mnemonic  Ra.rq,#b.ib,Re.wq 


Operation: 
Re «+ Rav AND Rbv 
Re «+ Rav OR Rbv 
Rec + Rav XOR Rbv 
Rc «+ Rav AND {NOT Rbv} 
Re «+ Rav OR {NOT Rbv} 
Re «+- Rav XOR {NOT Rbv} 
Exceptions: 
None 


Instruction mnemonics: 


AND Logical Product 
BIC Logical Product with Complement 
BIS Logical Sum (OR) | 
EQV Logical Equivalence KORNOT) 
ORNOT Logical Sum with Complement 
XOR Logical Difference 

Qualifiers: 
None 


Description: 


!Operate format 


!Operate format 


{AND 
'BIS 
!XOR 
'BIC 
!ORNOT 
!EOV 


These instructions perform the designated Boolean function between register Ra and 
register Rb or a literal. The result is written to register Re. 


The “NOT” function can be performed by doing an ORNOT with zero (Ra = R31). 
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4.5.2 Conditional Move Integer 


- Format: 
CMOVxx Ra.rq,Rb.rq,Re.wq !Operate format 
CMOVxx Ra.ra,#b.ib,Re.wq {Operate format 
Operation: 


IF TEST(Rav, Condition_based_on_Opcode) THEN 
Re +«- Rbv 


Exceptions: 


None 


Instruction mnemonics: 


CMOVEQ CMOVE if Register Equal to Zero 

CMOVGE CMOVE if Register Greater Than or Equal to Zero 
CMOVGT  CMOVE if Register Greater Than Zero 
CMOVLBC CMOVE if Register Low Bit Clear 

CMOVLBS CMOVE if Register Low Bit Set 

CMOVLE CMOVE if Register Less Than or Equal to Zero 
CMOVLT CMOVE if Register Less Than Zero 

CMOVNE  CMOVE if Register Not Equal to Zero 


Qualifiers: 


None 


Description: 


Register Ra is tested. If the specified relationship is true, the value Rbv is written 
to register Rc. 
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Notes: 
Except that it is likely in many implementations to be substantially faster, the 
instruction: 


CMOVEQ Ra,Rb,Re 


is exactly equivalent to: 


BNE Ra, label 
OR Rb, Rb, Re 
label: 


For example, a branchless sequence for: 
R1=MAX (R1,R2) 


CMPLT R1,R2,R3 ! R3=1 if R1<R2 
CMOVNE R3,R2,R1 ! Move R2 to R1 if R1<R2 
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4.5.3 Shift Logical 


Format: 


SxL Ra.rg,Rb.rq,Re.wq !Operate format 

SxL Ra.rq,#b.ib,Re.wq !Operate format 
Operation: | 

Re + LEFT SHIFT(Rav, Rbv<5:0>) 'SLL 

Re «- RIGHT SHIFT(Rav, Rbv<5:0>) {SRL 
Exceptions: 

None 


Instruction mnemonics: 


SLL Shift Left Logical 
SRL Shift Right Logical 
Qualifiers: 
None 
Description: 


Register Ra is shifted logically left or right 0 to 63 bits by the count in register Rb 
or a literal. The result is written to register Re. Zero bits are propagated into the 
vacated bit positions. 
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4.5.4 Shift Arithmetic 


Format: 
SRA Ra.rq,Rb.rg,Re.wq !Operate format 
SRA Ra.rb,#b.ib,Re.wq !Operate format 
Operation: 


Re +~ ARITH_ RIGHT SHIFT(Rav, Rbv<5:0>) 
Exceptions: 

None 
Instruction mnemonics: 

SRA Shift Right Arithmetic 
Qualifiers: 

None 


Description: 
Register Ra is right shifted arithmetically 0 to 63 bits by the count in register Rb or 


a literal. The result is written to register Rc. The sign bit (Rav<63>) is propagated 
into the vacated bit positions. 
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4.6 Byte-Manipulation Instructions 


4-42 


Alpha provides instructions for operating on byte operands within registers. 
These instructions allow full-width memory accesses in the load/store instructions 
combined with powerful in-register byte manipulation. 


The instructions are summarized in Table 4—7. 


Table 4—7: Byte-Manipulation Instructions Summary 


Mnemonic Operation 

CMPBGE Compare Byte 

EXTBL Extract Byte Low 
EXTWL Extract Word Low 
EXTLL Extract Longword Low 
EXTQL Extract. Quadword Low 
EXTWH Extract Word High 
EXTLH Extract Longword High 
EXTQH Extract Quadword High 
INSBL Insert Byte Low 
INSWL Insert Word Low 
INSLL Insert Longword Low 
INSQL Insert Quadword Low 
INSWH Insert Word High 
INSLH Insert Longword High 
INSQH Insert Quadword High 
MSKBL Mask Byte Low 
MSKWL Mask Word Low 
MSKLL Mask Longword Low 
MSKQL Mask Quadword Low 
MSKWH Mask Word High 
MSKLH Mask Longword High 
MSKQH Mask Quadword High 
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Table 4-7 (Cont.): Byte-Manipulation Instructions Summary 


Mnemonic Operation 
ZAP | Zero Bytes 
ZAPNOT Zero Bytes Not 
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4.6.1 Compare Byte 


Format: 
CMPBGE Ra.rq,Rb.rq,Re.wq !Operate format 
CMPBGE Ra.rq,#b.ib,Re.wq {Operate format 
Operation: 


FOR i FROM 0 TO 7 


temp<8:0> + {0 || Rav<i*8+7:i*8>} + 
{O| | NOT Rbv<i*84+7:i*8>} + 1 
Re<i> «+ temp<8> 
END 
Rce<63:8> «— 0 


ee a Ve wen ws Sew 8 


Fycentions: 


instruction mnemonics: 


CMPBGE Compare Byte 
Qualifiers: 
None 


Description: 


CMPBGE does eight parallel unsigned byte comparisons between corresponding 
bytes of Rav and Rbv, storing the eight results in the low eight bits of Re. The 
high 56 bits of Re are set to zero. Bit 0 of Re corresponds to byte 0, bit 1 of Re 
corresponds to byte 1, and so forth. A result bit is set in Re if the corresponding byte 
of Rav is greater than or equal to Rbv (unsigned). 
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Notes: 


The result of CMPBGE can be used as an input to ZAP and ZAPNOT. 


To scan for a byte of zeros in a character string: 


<initialize Rl to aligned 


LOOP : 
LDQ R2,0(R1) ; 
LDA R1,8(R1) ; 
CMPBGE R31,R2,R3 ; 
BEQ R3, LOOP ; 


® 
ee v 


s 
e 


QW address of string> 


Pick up 8 bytes 

Increment string pointer 

If NO bytes of zero, R3<7:0>=0 
Loop if no terminator byte found 
At this point, R3 can be used to 
determine which byte terminated 


To compare two character strings for greater/less: 


<initialize Rl to aligned 
<initialize R2 to aligned 


LOOP : 
LDQ R3,0(R1) : 
LDA R1, 8 (R1) ; 
LDQ R4,0(R2) : 
LDA R2,8(R2) ; 
XOR R3,R4,R5 ; 
BEQ R5, LOOP : 
CMPBGE R31,R5,R5 , 


QW address of stringl> 
QW address of string2> 


Pick up 8 bytes of stringl 
Increment stringl pointer 
Pick up 8 bytes of string2 
Increment string2 pointer 
Test for all equal bytes 
Loop if all equal 


At this point, R5 can be used to 
determine the first not-equal 
byte position. 


To range-check a string of characters in R1 for ‘0’..‘9’: 


LDQ R2,1it0s : 
LDO R3,1it9s ' = 
CMPBGE R2,R1,R4 : 
CMPBGE:  R1,R3,R5 ; 
BNE R4, BRROR : 
BNE R5,ERROR . 


Pick up 8 bytes of the character 


BELOW ‘0’ ‘////////' 

Pick up 8 bytes of the character 

ABOVE ‘9’ beeen eed 

Some R4<i>=1 if character is LT ‘0’ 

Some R5<i>=1 if character is GT ‘9’ 

Branch if some char too low 

Branch if some char too high 
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4.6.2 Extract Byte 


Format: 
EXT xx Ra.rgq,Rb.rq,Re.wq {Operate format 
EXTxx Ra.rq,#b.ib,Rce.wq !Operate format 
Operation: 
CASE 
EXTBL: byte mask «+ 0000 00012 
EXTWx: byte mask « 0000 00119 
EXTLx: byte_mask + 0000 11119 
EXTQx: byte_mask + 1111 111192 
ENDCASE 
CASE 
EXTxL: 
byte loc + Rbv<2:0>*8 
temp +- RIGHT SHIFT(Rav, byte _loc<5:0>) 
Re + BYTE ZAP(temp, NOT(byte mask) ) 
EXTxH: , 
byte loc + 64 - Rbv<2:0>*8 
temp +- LEFT SHIFT(Rav, byte_loc<5:0>) 
Re «+ BYTE ZAP (temp, NOT(byte mask) ) 
ENDCASE 
Exceptions: 
None 


Instruction mnemonics: 


EXTBL Extract Byte Low 
EXTWL ~~ Extract Word Low 
EXTLL Extract Longword Low 
| EXTQL Extract Quadword Low 
-EXTWH Extract Word High 
EXTLH Extract Longword High 
EXTQH Extract Quadword High 
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Qualifiers: 


None 


Description: 


EXTxL shifts register Ra right by 0 to 7 bytes, inserts zeros into vacated bit positions, 
and then extracts 1, 2, 4, or 8 bytes into register Rc. EXTxH shifts register Ra left 
by 0 to 7 bytes, inserts zeros into vacated bit positions, and then extracts 2, 4, or 8 
bytes into register Rc. The number of bytes to shift is specified by Rbv<2:0>. The 
number of bytes to extract is specified in the function code. Remaining bytes are 
filled with zeros. 


Notes: 

The comments in the examples below assume that the effective address (ea) of 
X(R11) is such that (ea mod 8) = 5, the value of the aligned quadword containing 
X(R11) is CBAx xxxx, and the value of the aligned quadword containing X+7(R11) is 
yyyH GFED. 


The examples below are the most general case unless otherwise noted; if more 
information is known about the value or intended alignment of X, shorter sequences 
can be used. 


The intended sequence for loading a a from unaligned address X(R11) is: 


LDQ U R1,X(R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
LDQ _U R2,X+7 (R11) ; Ignores va<2:0>, R2 = yyyH GFED 
LDA R3,X (R11) ; R3<2:0> = (X mod 8) = 5 

EXTOL R1,R3,R1 ; Rl = 0000 OCBA 

EXTOH R2,R3,R2 ; R2 = HGFE DO0OO 

OR R2,R1,R1 ; R1 = HGFE DCBA 


The intended sequence for nn and zero-extending a longword from unaligned 
address X is: 


LDQ _U R1,X (R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
LDQ U R2,X+3 (R11) ; Ignores va<2:0>, R2 = yyyy yyyD 
LDA R3,X (R11) ; R3<2:0> = (X mod 8) = 5 

EXTLL R1,R3,R1 ; R1 = 0000 OCBA 

EXTLH R2,R3,R2 ; R2 = 0000 DODO 

OR R2,R1,R1 ; R1 = 0000 DCBA 


The intended sequence for loading and sign-extending a longword from unaligned 
address X is: 


LDQ U R1,X(R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
LDQ U R2,X+3 (R11) ; Ignores va<2:0>, R2 = yyyy yyyD 
LDA R3,X (R11) 3; R3<2:0> = (X mod 8) = 5 

EXTLL R1,R3,R1 ; R1l = 0000 OCBA 
EXTLH R2,R3,R2 ; R2 = 0000 DOOD 
OR R2,R1,R1 ; R1 = 0000 DCBA 
SLL R1,#32,R1 ; R1 = DCBA 0000 
SRA R1,#32,R1 ; R1 = ssss DCBA 
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The intended sequence for loading and zero-extending a word from unaligned address 


X is: | dd 
LDQ U R1,X (R11) ; Ignores va<2:0>, Rl = yBAx xxxx 
LDQ _U R2,X+1 (R11) ; Ignores va<2:0>, R2 = yBAx xxxx 
LDA R3,X (R11) | ; R3B<2:0> = (X mod 8) = 5 
EXTWL R1,R3,R1 ; R1 = 0000 OOBA 
EXTWH R2,R3,R2 ; R2 = 0000 OO000 
OR R2,R1,R1 >; R1l = 0000 OOBA 
The intended sequence for loading and sign-extending a word from unaligned address 
X is: | | 
LDQ U- R1,X(R11) Ignores va<2:0>, Rl = yBAx xxxx 
LDQ U R2,X+1 (R11) Ignores va<2:0>, R2 = yBAx xxxx 
LDA R3,X (R11) R3<2:0> = (X mod 8) = 


5 
Rl = 0000 OOBA | 
R2 = 0000 0000 


EXTWL  R1,R3,R1 
EXTWH  R2,R3,R2 


~s “e Se We Ne Ne Re Bo 


OR R2,Ri2zR1 R1 0000 OOBA 
SLL ~~ R1,#48,R1 R1l = BAOO 0000 
SRA n1,#48,R1 Ro = Ss5s5 555 


The intended sequence for loading and zero-extending a byte from address X is: 


LDQ U R1,X (R11) ; Ignores va<2:0>, R1 
LDA R3,X (R11) ; R3<2:0> = (X mod 8) 
EXTBL R1,R3,R1 ; R1 = 0000 OOOA 


VYAX XXXxX 
5 


The intended sequence for loading and sign-extending a byte from address X is: 


LDQ U Rl, X(R11) 


2 Ignores va<2:0>, Rl = yyAx xxxx 
LDA R3, X+1(R11) 


R3<2:0> = (X + 1) mod 8, i.e., 
convert byte position within 
quadword to one-origin based 

Places the desired byte into byte 7 
of Rl.final by left shifting 
Rl.initial by ( 8 - R3<2:0> ) byte 
positions 

Arithmetic Shift of byte 7 down 
into byte 0, 


EXTQH R1, R3, R1 


SRA Rl, #56, R1 


=e “ese We Re We We We We We We 


Optimized examples: 


Assume that a word fetch is needed from 10(R3), where R3 is intended to contain 
a longword-aligned address. The optimized sequences below take advantage of the 
known constant offset, and the longword alignment (hence a single aligned longword 
contains the entire word). The sequences generate a Data Alignment Fault if R3 does 
not contain a longword-aligned address. 


The intended sequence for loading and zero-extending an aligned word from 10(R3) 


1s: 
LDL R1, 8 (R3) ; Rl = ssss BAxx 
; Faults if R3 is not longword aligned 
EXTWL R1,#2,R1 ; R1 = 0000 OOBA 
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The intended sequence for loading and sign-extending an aligned word from 10(R3) 
is: 


LDL R1,8 (R3) ; Rl = ssss BAxx 


; Faults if R3 is not longword aligned 
SRA R1, #16,R1 ; Rl = ssss ssBA 
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4.6.3 Byte Insert 


Format: | 
-INSxx __ Ra.rq,Rb.rq,Re.wq | | !Operate format — 
INSxx — Ra.rq,#b.ib,Re.wq ‘!Operate format 
Operation: 
CASE 


INSBL: byte mask + 0000 0000 0000 00019 

INSWx: byte_mask +- 0000 0000 0000 00119 

INSLx: byte_mask + 0000 0000 0000 11119 

INSQx: byte_mask «+ 0000 0000 1111 11119 
ENDCASE 


byte mask «+ LEFT SHIFT(byte mask, rbv<2:0>) 
CASE | 


INSxXL: 

byte loc + Rbv<2:0>*8 

temp +- LEFT SHIFT(Rav, byte _loc<5:0>) 

Rc «+ BYTE ZAP (temp, NOT (byte_mask<7:0>) ) 
INSxH: 


byte loc + 64 - Rbv<2:0>*8 
temp + RIGHT SHIFT(Rav, byte loc<5:0>) 
Re + BYTE ZAP (temp, NOT (byte mask<15:8>) ) 


ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


INSBL Insert Byte Low 
INSWL Insert Word Low 
INSLL Insert Longword Low 
INSQL Insert Quadword Low 
INSWH Insert Word High 
INSLH Insert Longword High 
INSQH Insert Quadword High 
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Qualifiers: 
None 


Description: 


INSxL and INSxH shift bytes from register Ra and insert them into a field of zeros, 
storing the result in register Rc. Register Rb<2:0> selects the shift amount, and the 
function code selects the maximum field width: 1, 2, 4, or 8 bytes. The instructions 
can generate a byte, word, longword, or quadword datum that is spread across two 
registers at an arbitrary byte alignment. | 
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4.6.4 Byte Mask 


Format: 
MSKxx | Ra.rq,Rb.rq,Rce.wq lOperate format 
MSKxx = Ra.rq,#b.ib,Re.wq - lOperate format 


Operation: 


CASE 


MSKBL: byte mask — 0000 0000 0000 00012 

MSKWx: byte mask +- 0000 0000 0000 001192 

MSKLx: byte _mask «- 0000 0000 0000 11119 
— 


MSKQOx: byte mask 0000 0000 1111 11119 
ENDCASE 


byte mask «+ LEFT SHIFT(byte_mask, rbv<2:0>) 
CASE 


MSKxL: | 
Rc + BYTE ZAP(Rav, byte _mask<7:0>) 
MSKxH: 
Re «+ BYTE ZAP (Rav, byte _mask<15:8>) 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


MSKBL Mask Byte Low 
MSKWL Mask Word Low 
MSKLL Mask Longword Low 
MSKQL Mask Quadword Low 
MSKWH Mask Word High 
MSKLH Mask Longword High 
MSKQH Mask Quadword High 


Qualifiers: 


None 


4-52 Common Architecture (I) 





wt 


Description: 


MSKxL and MSKxH set selected bytes of register Ra to zero, storing the result 
in register Re. Register Rb<2:0> selects the starting position of the field of zero 
bytes, and the function code selects the maximum width: 1, 2, 4, or 8 bytes. The 
instructions generate a byte, word, longword, or quadword field of zeros that can 
spread across two registers at an arbitrary byte alignment. 


Notes: 

The comments in the examples below assume that the effective address (ea) of X(R11) 
is such that (ea mod 8) = 5, the value of the aligned quadword containing X(R11) is 
CBAx xxxx, the value of the aligned quadword containing X+7(R11) is yyyH GFED, 
and the value to be stored from R5 is hgfe dcba. 


The examples below are the most general case; if more information is known about 
the value or intended alignment of X, shorter sequences can be used. 


The intended sequence for storing an unaligned quadword R5 at address X(R11) is: 


LDA R6,X (R11) > R6<2:0> = (X mod 8) = 5 

LDQ U R2,X+7 (R11) ; Ignores va<2:0>, R2 = yyyH GFED 
LDQ _U R1,X (R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
INSQH R5,R6,R4 ; R4 = 000h gfed 

INSQL R5,R6,R3 ; R3 = cbaOd 0000 

MSKOQH R2,R6,R2Z ; R2 = yyyO 0000 

MSKOL R1,R6,R1 > Rl = OOOx xxxx 

OR R2,R4,R2 ; R2 = yyyh gfed 

OR R1,R3,R1 ; Rl = cbhax xxxx 

STQ U R2,X+7 (R11) ; Must store high then low for 
STQ U R1,X (R11) ; degenerate case of aligned QW 


The intended sequence for storing an unaligned longword R5 at X is: 


LDA © R6,X (R11) ; R6<2:0> = (X mod 8) = 5 

LDQ U R2,X+3 (Ril) ; Ignores va<2:0>, R2 = yyyy yyyD 
LDQ U R1,X (R11) ; Ignores va<2:0>, Rl = CBAx xxxx 
INSLH R5,R6,R4 ; R4 = 0000 000d 

INSLL R5,R6,R3 ; R3 = chad 0000 

MSKLH R2,R6,R2 ; R2 = yyvy yyy0 

MSKLL R1,R6,R1 : Rl = 0OOx xxxx 

OR R2,R4,R2 ; R2 = yyyvy yyyd 

OR R1,R3,R1 7; Rl = cbhax xxxx 

STQ U R2,X+3 (R11) ; Must store high then low for 


STO U 





R1,X (R11) 


degenerate case of aligned 





Instruction Descriptions (I) 


Distribution 


The intended sequence for storing an unaligned word Ré5 at x is: 


LDA 
LbQ_U 
LDQ U 
INSWH 
INSWL 
MSKWH 
MSKWL 
OR 
OR | 
STO U 
STQ U 


R6,X(R11) 
R2,X+1 (R11) 
R1,X(R11) 
R5,R6,R4 
R5,R6,R3 
R2,R6,R2 
R1,R6,R1 
R2,R4,R2_ 
R1,R3,R1 
R2,X+1 (R11) 
R1,X (R11) 


° 
g 
e 
4 
v 
e 
r 
td 
e 
f 
f 
f 
? 
? 
« 
4 


R6<2:0> = 
Ignores va<2:0>, R2 
Ignores va<2:0>, Rl 


R4 
R3 
R2 
R1 
R2 
R1 


-* Must 


0000 
Obal 
yBAx 
yO0Ox 
yBAx 
ybax 


(X mod 8) 


0000 
0000 
XXXX 


SAR 


P.O. oo -4 
P.O: @:< 


5 


YBAX XXXX 


yBAx xXxxx 


store high then low for 
degenerate case of aligned 


The intended sequence for storing a byte R5 at X is: 


LDA 
LDQ U 
INSBL 
MSKBL 
OR 

STQ U 


R6,X(R11) 
R1,X (R11) 
R5,R6,R3 
R1,R6,R1 
R1,R3,R1 
R1,X (R11) 
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R6<2:0> = 


Ignores va<2:0>, 


R3 
R1 
R1 


00a0 
yy0Ox 


(X mod 8) 


0000 
XXX 


a Cua te ath de Oe 


Rl 


YVVYAX XXxXX 





4.6.5 Zero Bytes 


Format: 


ZAPx —— Ra.rq,Rb.rq,Re.wq | !Operate format 
ZAPx | Ra.rq,#b.ib,Re.wq {Operate format 
Operation: 
CASE 
ZAP : | 
Re + BYTE ZAP (Rav, rbv<7:0>) 
ZAPNOT : 
Re + BYTE ZAP(Rav, NOT rbv<7:0>) 
ENDCASE 
Exceptions: 
None. 


Instruction mnemonics: 


ZAP Zero Bytes 
ZAPNOT Zero Bytes Not 
Qualifiers: 


None 


Description: 


ZAP and ZAPNOT set selected bytes of register Ra to zero, and store the result in 
register Rc. Register Rb<7:0> selects the bytes to be zeroed; bit 0 of Rbv corresponds 
to byte 0, bit 1 of Rbv corresponds to byte 1, and so on. A result byte is set to zero 
if the corresponding bit of Rbv is a one for ZAP and a zero for ZAPNOT. 
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4.7 Floating-Point Instructions 


Alpha provides instructions for operating on floating-point operands in each of four 
data formats: 


° F floating (VAX single) 

e G_floating (VAX double, 11-bit exponent) 
° § floating (IEEE single) | 

e T_floating (IEEE double, 11-bit exponent) 


Data conversion instructions are also provided to convert operands between floating- 
point and quadword integer formats, between double and single floating, and 
between quadword and longword integers. 


| NOTE 

D_floating is a partially supported datatype; no D_ 
floating arithmetic operations are provided in the > 
architecture. For backward compatibility, exact D_ 
floating arithmetic may be provided via software 
emulation. D_floating “format compatibility,” in which 
binary files of D_floating numbers may be processed 
but without the last 3 bits of fraction precision, can 
be obtained via conversions to G_floating, G arithmetic 
operations, then conversion back to D_floating. 


The choice of data formats is encoded in each instruction. Each instruction also 
encodes the choice of rounding mode and the choice of trapping mode. 


All floating-point operate instructions (that is, not including loads or stores) that 
yield an F_ or G_floating zero result must materialize a true zero. 
4.7.1 Floating Subsets and Floating Faults 


All floating-point operations may take floating disabled faults. Any subsetted 
floating-point instruction may take an Illegal Instruction Trap. These faults are 
not explicitly listed in the description of each instruction. 


All floating-point loads and stores may take memory management faults (access 
control violation, translation not valid, fault on read/write, data alignment). 


The Floating-point Enable (FEN) internal processor register (IPR) allows system 
software to restrict access to the floating registers. 


If a floating instruction is implemented and FEN = 0, attempts to execute the 
instruction cause a floating disabled fault. 


If a floating instruction is not implemented, attempts to execute the instruction 
cause an Illegal Instruction Trap. This rule holds regardless of the value of FEN. 


An Alpha implementation may provide both VAX and IEEE floating-point. operations, 
either, or none. | 
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Some floating-point instructions are common to the VAX and IEEE subsets, some 
are VAX only, and some are IEEE only. These are designated in the descriptions 
that follow. If either subset is implemented, all the common instructions must be 
implemented. . 


An implementation including IEEE floating-point may subset the ability to perform 
rounding to plus infinity and minus infinity. If not implemented, instructions 
requesting these rounding modes take Illegal Instruction Trap. 


4.7.2 Definitions 
The following definitions apply to Alpha floating-point support. 


true result 

The mathematically correct result of an operation, assuming that the input operand 
values are exact. The true result is typically rounded to the nearest representable 
result. 


representable result 
a real number that can be represented exactly as a VAX or IEEE floating-point 
number, with finite precision and bounded exponent range. 


LSB | 

The least significant bit. For a positive representable number A whose fraction is 
not all ones, A + 1 LSB is the next larger representable number, and A + 1/2 LSB 
is exactly halfway between A and the next larger representable number. 


true zero 
The value +0, represented as exactly 64 zeros in a floating-point register. 


Aipha finite number 

A floating-point number with a definite, in-range value. Specifically, all numbers 
in the inclusive ranges -MAX..—MIN, zero, +MIN..+MAX, where MAX is the largest 
non-infinite representable floating-point number and MIN is the smallest non-zero 
representable normalized floating-point number. 


For VAX floating-point, finites do not include eeeeyed operands or dirty zeros (this 
differs from the usual VAX interpretation of dirty zeros as finite). For IEEE floating- 
point, finites do not include infinites, NaNs, or denormals, but do include minus zero. 


Not-a-Number 

An [EEE floating-point bit pattern that represents something other than a number. 
This comes in two forms: signaling NaNs (for Alpha, those with an initial fraction 
bit of 1) and quiet NaNs (for Alpha, those with initial fraction bit of 0). 


infinity 
An IEEE floating-point bit pattern that represents plus or minus infinity. 
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denormal 


An IEEE floating-point bit pattern that scnneuents a number whose magnitude lies 
between zero and the smallest finite number. 


dirty zero 


A VAX floating-point bit pattern that represents a zero value, but not in true-zero 
form. 


reserved operand 
A VAX floating-point bit pattern that represents an illegal value. 


trap shadow 


The set of instructions potentially executed after an instruction that signals an 
arithmetic trap but before the trap is actually taken. 


4.7.3 Encodings 


Floating-point numbers are represented with three fields: sign, exponent, and 
fraction. The sign is 1 bit; the exponent is 8 or 11 bits; and the fraction is 23, 
52, or 55 bits. Some encodings represent special values: 


Vax VAX IEEE IEEE 

Sign Exponent Fraction Meaning Finite Meaning Finite 
x All-1’s Non-zero Finite Yes +/~NaN No 

x Alls 0 Finite Yes  +/-Infinity No 

0 0 Non-zero Dirty zero No +Denormal No 

1 0 Non-zero Resv. operand No —Denormal No 

0 0 0 True zero Yes +0 Yes 

1 0 0 Resv. operand No —0 Yes 

x 


Other x Finite Yes finite Yes 


The values of MIN and MAX for each of the four floating-point data formats are: 


Data Format MIN MAX 
F_ floating 2**127 * 0.5 2**127 * (1.0 — 2**-24) 
| (0.294e—38) (1.70e38) | 
G_floating 2**1023 * 0.5 2**1023 * (1.0 — 2**-53) 
(0.56e-308) (0.899e308) 
S_floating 2**-126 * 1.0 2**127 * (2.0 — 2**-23) 
(1.175e—38) (3.40e38) 
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Data Format MIN | MAX 


T_floating 2**-1022 * 1.0 2**1023 * (2.0 — 2**-52) 
(2.225e—308) (1.798e308) 


4.7.4 Floating-Point Rounding Modes 


All rounding modes map a true result that is exactly representable to that 
representable value. 


VAX Rounding Modes 


For VAX floating-point operations, two rounding modes are provided and are 
specified in each instruction: normal (biased) rounding and chopped rounding. 


Normal VAX rounding maps the true result to the nearest of two representable 
results, with true results exactly halfway between mapped to the larger in absolute 
value (sometimes called biased rounding away from zero); maps true results 
> MAX + 1/2 LSB in magnitude to an overflow; maps true results < MIN — 1/2 LSB 
in magnitude to an underflow. 


Chopped VAX rounding maps the true result to the smaller in magnitude of two 
surrounding representable results; maps true results > MAX + 1 LSB in magnitude 
to an overflow; maps true results < MIN in magnitude to an underflow. 


IEEE Rounding Modes 

For IEEE floating-point operations, four rounding modes are provided: normal 
rounding (unbiased round to nearest), rounding toward minus infinity, round toward 
zero, and rounding toward plus infinity. The first three can be specified in the 
instruction. Rounding toward plus infinity can be obtained by setting the Floating- 
point Control Register (FPCR) to select it and then specifying dynamic rounding 
mode in the instruction (See Section 4.7.7). Alpha IEEE arithmetic does rounding 
before detecting overflow/underflow. 


Normal IEEE rounding maps the true result to the nearest of two representable 
results, with true results exactly halfway between mapped to the one whose 
fraction ends in 0 (sometimes called unbiased rounding to even); maps true results 
> MAX + 1/2 LSB in magnitude to an overflow; maps true Boers < MIN — 1/2 LSB 
in magnitude to an underflow. 


Plus infinity IEEE rounding maps the true result to the larger of two surrounding 
representable results; maps true results > MAX in magnitude to an overflow; maps 
positive true results < +MIN — 1 LSB to an underflow; and maps negative true 
results > —MIN to an underflow. 


Minus infinity IEEE rounding maps the true result to the smaller of two surrounding 
representable results; maps true results > MAX in magnitude to an overflow; maps 
positive true results <+MIN to an underflow; and maps negative true results 
> —-MIN + 1 LSB to an underflow. 


Instruction Descriptions (1) 4-59 


BOOao 
arate erate 


Distribution 








Chopped IEEE rounding maps the true result to the smaller in magnitude of two — 
surrounding representable results; maps true results > MAX + 1 LSB in magnitude 
to an overflow; and maps non-zero true results < MIN in magnitude to an underflow. 


Dynamic rounding mode uses the IEEE rounding mode selected by the FPCR register — 
and is described in more detail in Section 4.7.7. 


The following tables summarize the floating-point rounding modes: 


VAX Rounding Mode Instruction Notation 


Normal rounding (No modifier) 

Chopped /C 

IEEE Rounding Mode Instruction Notation 

Normal rounding (No modifier) 

Dynamic rounding /D | 

Plus infinity /D and ensure that FPCR<DYN> = ‘11’ 
Minus infinity /M 

Chopped /C 


4.7.5 Floating-Point Trapping Modes 


There are six exceptions that can be generated by floating-point operate instructions, 
all signaled by an arithmetic exception trap. These exceptions are: 


¢ Invalid operation 

¢ Division by zero 

e Overflow 

¢ Underflow, may be disabled 

e Inexact result, may be disabled 

e Integer overflow (conversion to integer only), may be disabled 


For more detail on the information passed to an arithmetic exception handler, see 
Part II, Operating Systems. 


VAX Trapping Modes 

For VAX floating-point operations other than CVTxQ, four trapping modes are 
provided. They specify software completion and whether traps are enabled for 
underflow. 


For VAX conversions from floating-point to integer, four pins ing modes are provided. 
They specify software completion and whether traps are enabled for integer overflow. 


cad lien J er er a ee OY 
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IEEE Trapping Modes 

For IEEE floating-point operations other than CVTxQ, four trapping modes are 
provided. They specify software completion and whether traps are enabled for 
underflow and inexact results. 


For IEEE conversions from floating-point to integer, four trapping modes are 
provided. They specify software completion, and whether traps are enabled for 
integer overflow and inexact results. 


The modes and instruction notation are: 


VAX Trap Mode Instruction Notation 
Imprecise, underflow disabled (No modifier) 
Imprecise, underflow enabled /U 

Software, underflow disabled 'S 

Software, underflow enabled /SU 

VAX Convert-to-Integer Trap Mode Instruction Notation 
Imprecise, integer overflow disabled (No modifier) 
Imprecise, integer overflow enabled IV 

Software, integer overflow disabled /S 

Software, integer overflow enabled [SV 

IEEE Trap Mode _ | | Instruction Notation 
Imprecise, unfl disabled, inexact disabled (No modifier) 
Imprecise, unfi enabled, inexact disabled [U 

Software, unfl enabled, inexact disabled /SU 

Software, unfl enabled, inexact enabled /SUI 

IEEE Convert-to-Integer Trap Mode | Instruction Notation 
Imprecise, int.ovfl disabled, inexact disabled (No modifier) 
Imprecise, int.ovfl enabled, inexact disabled IV 

Software, int.ovfl enabled, inexact disabled [SV 


Software, int.ovfl enabled, inexact enabled /SVI 
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4.7.5.1 Imprecise /Software Completion Trap Modes 


_ Floating-point instructions may be pipelined, and all exceptions are imprecise traps: 


The trapping instruction may write an UNPREDICTABLE result value. 


The trap PC is an arbitrary number of instructions past the one triggering 
the trap. The trigger instruction plus all intervening executed instructions are 
collectively referred to as the trap shadow of the trigger instruction. 


The extent of the trap shadow is bounded only by a TRAPB instruction (or the — 
implicit TRAPB within a CALL_PAL instruction). | 


Input operand values may have been overwritten in the trap shadow. 
Result values may have been overwritten in the trap shadow. 


An UNPREDICTABLE result value may have been used as an input operand in 
the trap shadow. 


Additional traps may occur in the trap shadow. 


In general, it is not feasible to fix up the result value or te continue from the 
trap. 


This behavior is ideal for operations on finite operands that give finite results. For 
programs that deliberately operate outside the overflow/underflow range, or use 
IEEE NaNs, software assistance is required to complete floating-point operations 
correctly. This assistance can be provided by a software arithmetic trap handler, 
plus constraints on the instructions surrounding the trap. 


For a trap handler to complete non-finite arithmetic, the following conditions must 
hold: 


1. 


4. 
5. 


On entry to the trap shadow, if any Alpha register or memory location contains 
a value that is used as an operand value by some instruction in the trap shadow 
(live on entry), then no instruction in the trap shadow may modify the register 
or memory location. 


Within the trap shadow, the computation of the base register for a memory load 
or store instruction may not involve using the result of an instruction that might 
generate an UNPREDICTABLE result. 


Within the trap shadow, no register may be used more than once as a destination 
register. | 


The trap shadow may not include any branch instructions. 


Each floating instruction to be completed must be so marked, by specifying the 
/S software completion modifier. 


The first condition allows a software trap handler to emulate the trigger instruction 
with its original input operand values and then to reexecute the rest of the trap 
shadow. 


The second condition prevents memory accesses at unpredictable addresses. 
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The remaining conditions make it possible for a software trap handler to find the 
trigger instruction via a linear scan backwards from the trap PC. 


NOTE 

The /S modifier does not affect instruction operation 
or trap behavior; it is an informational bit passed to 
a software trap handler. It allows a trap handler to 
test easily whether an instruction is intended to be 
completed. (The /S bits of instructions signaling traps 
are carried into the trap summary.) The handler may 
then assume that the other conditions are met without 
examining the code stream. 


_ If a software trap handler is provided, it must handle the completion of all floating- 


4.7.5.2 


4.7.5.3 


4.7.5.4 


4.7.5.5 


point operations marked /S that follow the rules above. In effect, one TRAPB 
instruction per basic block can be used. 


Invalid Operation Arithmetic Trap | 


An invalid operation arithmetic trap is signaled if any operand of a floating 
arithmetic-operate instruction is non-finite. (CMPTxy is an exception to the rule 
and operates normally with plus and minus infinity and does not trap in this case.) 
This trap is always enabled. If this trap occurs, an UNPREDICTABLE value is 
stored in the result register. (IEEE-compliant system software must also supply an 
invalid operation indication to the user for SQRT of a negative non-zero number, 
0/0, x REM 0, and conversions to integer that take an integer overflow trap.) 


Division by Zero Arithmetic Trap 


A division by zero arithmetic trap is taken if the numerator does not cause an invalid 
operation trap and the denominator is zero. This trap is always enabled. If this trap 
occurs, an UNPREDICTABLE value is stored in the result register. 


Overflow Arithmetic Trap 


An overflow arithmetic trap is signaled if the rounded result exceeds in magnitude 
the largest finite number of the destination format. This trap is always enabled. If 
this trap occurs, an UNPREDICTABLE value is stored in the result register. 


Underfiow Arithmetic Trap 


An underflow occurs if the rounded result is smaller in magnitude than the smallest 
finite number of the destination format. 


If an underflow occurs, a true zero (64 bits of zero) is always stored in the result 
register, even if the proper IEEE result would have been —0 (underflow below the 
negative denormal range). 


If an underflow occurs and underflow traps are enabled by the instruction, an 
underflow arithmetic trap is signaled. 
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4.7.5.6 Inexact Result Arithmetic Trap 


An inexact result occurs if the infinitely precise result differs from the rounded 
result. 


If an inexact result occurs, the apmel rounded result is a stored in the pasult 
register. 


If an inexact result occurs and inexact result traps are enabled by the en 
an inexact result arithmetic trap is signaled. 


4.7.5.7 Integer Overflow Arithmetic Trap 


In conversions from floating to quadword integer, an integer overflow occurs if the 
rounded result is outside the range —2**63..2**63—1. In conversions from quadword 
integer to longword integer, an integer overflow occurs if the result is outside the 
range —2**31..2**31-1. 


If an integer overflow occurs in CVTxQ or CVTQL, the true result truncated to the 
low-order 64 or 32 bits respectively is stored in the result register. 


If an integer overflow occurs and integer overfiow traps are enabied by the 
instruction, an integer overflow arithmetic trap is signaled. 


4.7.6 Floating-Point Single-Precision Operations 


Single-precision values (F_floating or S_floating) are stored in the floating registers 
in canonical form, as subsets of double-precision values, with 11-bit exponents 
restricted to the corresponding single-precision range, and with the 29 low-order 
fraction bits restricted to be all zero. 


Single-precision operations applied to canonical single-precision values give single- 
precision results. Single-precision operations applied to non-canonical operands give 
UNPREDICTABLE results. 


Longword integer values in floating registers are stored in bits <63:62,58:29>, with 
bits <61:59> ignored and zeros in bits <28:0>. 


4.7.7 FPCR Register and Dynamic Rounding Mode 


When an IEEE floating-point operate instruction specifies dynamic mode (/D) in its 
function field (function code bits <7:6> = 11), the rounding mode to be used for the 
instruction is derived from the FPCR register. The layout of the rounding mode bits 
and their assignments matches exactly the format used in the 11-bit function field 
of the floating-point operate instructions. 


In addition, the FPCR gives a summary for each exception type of the exceptions 
conditions detected by all IEEE floating-point operates thus far as well as an 
overall summary bit that indicates whether any of these exception conditions has 
. been detected. The individual exception bits match exactly in purpose and order 
the exceptions bits found in the exception summary quadword that is pushed for 
arithmetic traps. However, for each instruction, these exceptions bits are set 
independent of the trapping mode specified for the instruction. Therefore, even 
though trapping may be disabled for a certain exceptional condition, the fact that 
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the exceptional condition was encountered by an instruction will still be recorded in 
the FPCR. 


Floating-point operates that belong to the IEEE subset and CVTQL, which belongs 
to both VAX and IEEE subsets, appropriately set the FPCR exception bits. It is 
UNPREDICTABLE whether floating-point operates that belong only to the VAX 
floating-point subset set the FPCR exception bits. 


Alpha floating-point hardware only transitions these exception bits from zero to one. 
Once set to one, these exception bits are only cleared when software writes zero into 
these bits by writing a new value into the FPCR. 


The format of the FPCR is shown in Figure 4—1 and described in Table 4-8. 


Figure 4—1: Fioating-Point Control Register (FPCR) Format 


63 62 60 59 58 57 56 55 54 53 52 51 0 
S bd |i ltiulolpl 1 

u| RAZ’ | Y JOININIVIZIN RAZ/IGN 

M N IVIEIFIFIEIV | 


Table 4-8: Floating-Point Control Register (FPCR) Bit pescnbeens 
Bit Description 


63 Summary Bit (SUM). Records bitwise OR of FPCR exception bits. Equal to 


(FPCR[57] | FPCR[56}] | FPCR[55] | FPCR[54] | FPCR[53] | FPCR[52)). 
62—60 Reserved. Read As Zero; Ignored when written. 
59-58 Dynamic Rounding Mode (DYN). Indicates the rounding mode to be used by 


an IEEE floating-point operate instruction when the instruction’s function field 
specifies dynamic mode (/D). Assignments are: 


DYN ’ TEEE Rounding Mode Selected 
00 Chopped rounding mode 
01 Minus infinity 
10 Normal rounding 
11 Plus infinity 
57 Integer Overflow (IOV). An integer arithmetic operation or a conversion from 


floating to integer overflowed the destination precision. 


56 Inexact Result (INE). A floating arithmetic or conversion operation gave a result 
that differed from the mathematically exact result. 
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Table 4-8 (Cont.): Floating-Point Control Register (FPCR) Bit Descriptions 


Bit Description 

55 Underflow (UNF). A floating arithmetic or conversion operation underflowed the 
destination exponent. 

54 Overflow (OVF). A floating arithmetic or conversion operation overflowed the 
destination exponent. 

53 Division by Zero (DZE). An attempt was made to peters a rane divide 
operation with a divisor of zero. 

52 Invalid Operation (INV). An attempt was made to perform a floating arithmetic, 
conversion, or comparison operation, and one or more of the operand values were 
illegal. | 

51-0 Reserved. Read As Zero; Ignored when written. 


FPCR is read from and written to the floating-point registers by the MT_FPCR and 
MF_FPCR instructions respectively, which are described in Section 4.7.7.1. 


FPCR and the instructions to access it are required for an implementation that 
supports floating-point (see Section 4.1.1.1). On implementations that do not support 
floating-point, the instructions that access FPCR (MF_FPCR and MT_FPCR) take 
an Illegal Instruction Trap. | 


SOFTWARE NOTE 
As noted in Section 4.1.1.1, support for FPCR is 
required on a system that supports the OpenVMS Alpha 
operating system even if that system does not support 
floating-point. 


4.7.7.1 Accessing the FPCR 


Because Alpha floating-point hardware can overlap the execution of a number of 
floating-point instructions, accessing the FPCR must be synchronized with other 
floating-point instructions. A TRAPB must be issued both prior to and after accessing 
the FPCR to ensure that the FPCR access is synchronized with the execution of 
previous and subsequent floating-point instructions; otherwise synchronization is 
not ensured. 


Issuing a TRAPB followed by an MT_FPCR followed by another TRAPB ensures 
that only floating-point instructions issued after the second TRAPB are affected 
by and affect the new value of the FPCR. Issuing a TRAPB followed by an MF_ 
FPCR followed by another TRAPB ensures that the value read from the FPCR only 
records the exception information for floating-point instructions issued prior to the 
first TRAPB. 


Consider the following example: 
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ADDT/D 


TRAPB 1 
MT FPCR F1,F1,F1 

TRAPB 22 
SUBT/D 


Without the first TRAPB, it is possible in an implementation for the - ADDT/D 
to execute in parallel with the MT_FPCR. Thus, it would be UNPREDICTABLE 
whether the ADDT/D was affected by the new rounding mode set by the MT_ 
FPCR and whether fields cleared by the MT_FPCR in the exception summary were 
subsequently set by the ADDT/D. 


Without the second TRAPB, it is possible in an implementation for the MT_FPCR to 
execute in parallel with the SUBT/D. Thus, it would be UNPREDICTABLE whether 
the SUBT/D was affected by the new rounding mode set by the MT_FPCR and 
whether fields cleared by the MT_FPCR in the exception summary field of FPCR 
were previously set by the SUBT/D. 


4.7.7.2 Default Values of the FPCR 
Processor initialization leaves the value of FPCR UNPREDICTABLE. 


SOFTWARE NOTE 
Digital software should initialize FPCR<DYN> = 11 
during program activation. Using this default, interval 
arithmetic code can switch from plus to minus infinity 
rounding with no penalty in performance by using /M 
and /D qualifiers. 


Program activation should clear all other fields of the 
_FPCR. 


_ 4.7.7.3 Saving and Restoring the FPCR 


The FPCR must be saved and restored across context switches so that the FPCR 
value of one process does not affect the rounding behavior and exception summary 
of another process. 


The dynamic rounding mode put into effect by the programmer (or initialized by 
image activation) is valid for the entirety of the program and remains in effect until 
subsequently changed by the programmer or until image run-down occurs. 


SOFTWARE NOTE 
The IEEE standard precludes saving and restoring the 
FPCR across subroutine calls. 


4.7.8 IEEE Standard 


The IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754- 
1985) is included by reference. 
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4.8 Memory Format Floating-Point Instructions 


The instructions in this section move data between the floating-point registers and 
memory. They use the Memory instruction format. They do not interpret the bits 
moved in any way; specifically, they do not trap on non-finite values. 


The instructions are summarized in Table 4-9. 


Table 4-9: Memory Format Floating-Point Instructions Summary 


Mnemonic Operation Subset 
LDF | Load F_floating VAX 
LDG Load G_floating (Load D_fioating) VAX 
LDS Load S_floating (Load Longword Integer) Both | 
LDT Load T_floating (Load Quadword Integer) Both 
STF Store F_floating VAX 
STG Store G_floating (Store D_fioating) VAX 
STS Store S_floating (Store Longword Integer) ~ Both 
STT Store T_floating (Store Quadword Integer) Both 


4-68 Common Architecture (I) 





4.8.1 Load F_floating 


Format: 


LDF Fa.wf,disp.ab(Rb.ab) {Memory format 


Operation: 


va «+ {Rbv + SEXT (disp) } 


Fa + (va)<1i5> [| MAP F((va)<14:7>) || 
(va)<6:0> ||] (va)<31:16> || 0<28:0> 


Exceptions: 


Access Violation 
Fault on Read 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDF Load F_floating 


Qualifiers: 


None 


Description: 


LDF fetches an F floating datum from memory and writes it to register Fa. If the 
data is not naturally aligned, an alignment exception is generated. 


The 8-bit memory-format exponent is expanded to an 11-bit register-format exponent 
according to Table 2—1. 


The virtual address is computed by adding register Rb to the sign-extended 16- 
bit displacement. The source operand is fetched from memory and the bytes are 
reordered to conform to the F floating register format. The result is then zero- 
extended in the low-order longword and written to register Fa. 
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4.8.2 Load G floating 


Format: 
LDG Fa.wg,disp.ab(Rb.ab) !Memory format 
Operation: 
va «+ {Rbv + SEXT (disp) } 
Fa «+ (va)<15:0> || (va)<31:16> || 
(va) <47:32> || (va) <63:48> 
Exceptions: 


Access Violation 
Hault an Raad 


Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDG Load G floating (Load D_floating) 


Qualifiers: 


None 


Description: 


LDG fetches a G._floating (or D_fioating) datum from memory and writes it to register 
Fa. If the data is not naturally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The source operand is fetched from memory, the bytes are reordered to 
conform to the G_floating register format (also conforming to the D floating register 
format), and the result is then written to register Fa. 


4-70 Common Architecture (I) 





Jistrioution 





rm testri icted L 





4.8.3 Load S_floating 


Format: 


LDS Fa.ws,disp.ab(Rb.ab) {Memory format 


Operation: 


va + {Rbv + SEXT (disp) } 


Fa + (va)<31> [|] MAP S((va)<30:23>) || 
(va) <22:0> || 0<28:0> : 


Exceptions: 


Access Violation 
Fault on Read 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


LDS Load S_fioating (Load Longword Integer) 


Qualifiers: 


None 


Description: 


LDS fetches a longword (integer or S_floating) from memory and writes it to register 
Fa. If the data is not naturally aligned, an alignment exception is generated. 


The 8-bit memory-format exponent is expanded to an 11-bit register-format exponent 
according to Table 2—2. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The source operand is fetched from memory, is zero-extended in the 
low-order longword, and then written to register Fa. 


Notes: 


¢ lLongword integers in floating registers are stored in bits <63:62,58:29>, with bits 
<61:59> ignored and zeros in bits <28:0>. 
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4.8.4 Load T_floating 


Format: 


LDT Fa.wt,disp.ab(Rb.ab) | '!Memory format 


Operation: 


va «- {Rbv + SEXT (disp) } 
Fa «- (va)<63:0> 


Exceptions: 


Access Violation 
Fault on Read 


Alignment 
Translation Not Valid 
Instruction mnemonics: 


LDT Load T_floating (Load Quadword Integer) 


Qualifiers: 


None 


Description: 
LDT fetches a quadword (integer or T_floating) from memory and writes it to register 
Fa. If the data is not naturally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The source operand is fetched from memory and written to register. 
Fa. | 


4-72 Common Architecture (I) 





igital ~ 


4.8.5 Store F_floating 


Format: 


STF Fa.rf,disp.ab(Rb.ab) !Memory format 


Operation: 

va + {Rbv + SEXT (disp) } 

(va) <31:0> «+ Fav<44:29> || Fav<63:62>|| Fav<58:45> 
Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


STF Store F_floating 


Qualifiers: 


None 


Description: 


STF stores an F floating datum from Fa to memory. If the data is not a 
aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The bits of the source operand are fetched from register Fa, the bits 
are reordered to conform to F_floating memory format, and the result is then written 

to memory. Bits <61:59> and <28:0> of Fa are ignored. No checking is done. | 
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—4.8.6 Store G_floating 


Format: | 
STG Fa.rg,disp.ab(Rb.ab) _ {Memory format 
Operation: 
va + {Rbv + SEXT (disp) } 
(va)<63:0> «— Fav<15:0> |{ Fav<31:16> || 
Fav<47:32> || Fav<63:48> 
Exceptions: 


Access Violation 
Fault an Writa 


Alignment — 
Translation Not Valid 


Instruction mnemonics: 


STG Store G_floating (Store D_floating) 


Qualifiers: 


None 


Description: 


STG stores a G_floating (or D_floating) datum from Fa to memory. If the data is not 
naturally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16- 
bit displacement. The source operand is fetched from register Fa, the bytes are 
reordered to conform to the G_floating memory format (also conforming to the D_ 
floating memory format), and the result is then written to memory. 
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4.8.7 Store S_ floating 


Format: 


STS Fa.rs,disp.ab(Rb.ab) — {Memory format 


Operation: 

va + {Rbv + SEXT (disp) } 

(va) <31:0> — Fav<63:62>| |Fav<58:29> 
Exceptions: 


Access Violation 
Fault on Write 
Alignment 
Translation Not Valid 


Instruction mnemonics: 


STS Store S_floating (Store Longword Integer) 


Qualifiers: 


None 


Description: 


STS stores a longword (integer or S_floating) datum from Fa to memory. If the data 
is not naturally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The bits of the source operand are fetched from register Fa, the bits 
are reordered to conform to S_floating memory format, and the result is then written 
to memory. Bits <61:59> and <28:0> of Fa are ignored. No checking is done. 
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4.8.8 Store T_floating 


Format: 
STT Fa.rt,disp.ab(Rb.ab) co {Memory format 
Operation: 


va + {Rbv + SEXT (disp) } 
(va)<63:0> «+ Fav<63:0> 


Exceptions: 


Access Violation — 
Fault on Write 


4 
ALLS AIO LLY 


Translation Not Valid 
Instruction mnemonics: 

STT Store T_floating (Store Quadword Integer) 
Qualifiers: 


None 


Description: 


STT stores a quadword (integer or T_floating) datum from Fa to memory. If the data 
is not naturally aligned, an alignment exception is generated. 


The virtual address is computed by adding register Rb to the sign-extended 16-bit 
displacement. The source operand is fetched from register Fa and written to memory. 
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4.9 Branch Format Floating-Point Instructions 


Alpha provides six floating conditional branch instructions. These branch-format 
instructions test the value of a floating-point register and conditionally change the 
PC. | 


They do not interpret the bits tested in any way; specifically, they do not trap on 
non-finite values. 


The test is based on the sign bit and whether the rest of the register is all zero bits. 
All 64 bits of the register are tested. The test is independent of the format of the 
operand in the register. Both plus and minus zero are equal to zero. A non-zero 
value with a sign of zero is greater than zero. A non-zero value with a sign of one 
is less than zero. No reserved operand or non-finite checking is done. 


The floating-point branch operations are summarized in Table 4-10. 


Table 4-10: Floating-Point Branch Instructions Summary 


Mnemonic Operation Subset 
FBEQ Floating Branch Equal | Both 
FBGE Floating Branch Greater Than or Equal Both 
FBGT Floating Branch Greater Than Both 
FBLE Floating Branch Less Than or Equal Both 
FBLT Floating Branch Less Than Both 
FBNE Floating Branch Not Equal | Both 
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4.9.1 Conditional Branch > 


Format: 


FBxx Fa.rq,disp.al 'Branch format 


Operation: 


{update PC} 

va + PC + {4*SEXT (disp) } 

IF TEST(Fav, Condition_based_on_Opcode) THEN 
PC + va 


Exceptions: 


None 


Instruction mnemonics: 


FBEQ Floating Branch Equal — 

FBGE Floating Branch Greater Than or Equal 
FBGT Floating Branch Greater Than 

FBLE Floating Branch Less Than or Equal 
FBLT Floating Branch Less Than 

FBNE Floating Branch Not Equal 


Qualifiers: 


None 


Description: 


Register Fa is tested. If the specified relationship is true, the PC is loaded with 
the target virtual address; otherwise, execution continues with the next sequential 
instruction. 


The displacement is treated as a signed longword offset. This means it is shifted 
left two bits (to address a longword boundary), sign-extended to 64 bits, and added 
to the updated PC to form the target virtual address. 


The conditional branch instructions are PC-relative only. The 21-bit signed 
displacement gives a forward/backward branch distance of +/- 1M instructions. 
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Notes: 


¢ To branch properly on non-finite operands, compare to F31, then branch on the 
result of the compare. 


¢ The largest negative integer (8000 0000 0000 0000,,) is the same bit pattern as 
floating minus zero, so it is treated as equal to zero by the branch instructions. 
To branch properly on the largest negative integer, convert it to floating or move 
it to an integer register and do an integer branch. | 
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4.10 Floating-Point Operate Format Instructions 


The floating-point bit-operate instructions perform copy and integer convert | 
operations on 64-bit register values. The bit-operate instructions do not interpret 
the bits moved in any way; specifically, they do not trap on non-finite values. 


The floating-point arithmetic-operate instructions perform add, subtract, multiply, 
divide, compare, and floating convert operations on 64-bit register values in one of 
the four specified floating formats. 


Each instruction specifies the source and destination formats of the values, as well 
as the rounding mode and trapping mode to be used. These instructions use the 
Floating-point Operate format. 


The floating-point operate instructions are summarized in Table 4—11. 
Table 4—11: Floating-Point Operate Instructions Summary 
Mnemonic Operation Subset 


Rit and FPCR Qneratione 


— we we ~— EF Wd bt WS QW LAW 


CPYS Copy Sign . Both 
CPYSE Copy Sign and Exponent Both 
CPYSN Copy Sign Negate Both 
CVTLQ Convert Longword to Quadword Both 
CVTQL Convert Quadword to Longword Both 
FCMOVxx Floating Conditional Move Both 
MF_FPCR Move from Floating-point Control Register _ Both 
MT_FPCR Move to Floating-point Control Register Both 
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Table 4—11 (Cont.): Floating-Point Operate Instructions Summary 


Mnemonic 


Operation 


Arithmetic Operations 


ADDF 
ADDG 
ADDS 
ADDT 


CMPGxx 
CMPTxx 


CVTDG 
CVTGD 
CVTGF 
CVTGQ 
CVTQF 
CVTQG 
CVTQS 
CVTQT 
CVTTQ 
CVTTS 


DIVF 
DIVG 
DIVS 

DIVT 


MULF 
MULG 
MULS 
MULT 


SUBF 


Add F_floating 
Add G_floating 
Add S_fioating 
Add T_floating 


Compare G_fioating 
Compare T_floating 


Convert D_floating to G_floating 
Convert G_floating to D_floating 
Convert G_floating to F_floating 
Convert G_floating to Quadword 
Convert Quadword to F_floating 
Convert Quadword to G_floating 
Convert Quadword to S_floating 
Convert Quadword to T_floating 
Convert T_floating to Quadword 
Convert T_floating to S_floating 


Divide F_floating 
Divide G_floating 
Divide S_floating 
Divide T_floating 


Multiply F_floating 
Multiply G_floating 
Multiply S_floating 
Multiply T_floating 


Subtract F_fioating 


Subset 


VAX 
VAX 
IEEE 
IEEE 


VAX 


IEEE 


VAX 
VAX 
VAX 
VAX 
VAX 
VAX 
IEEE 
IEEE 
IEEE 
IEEE 


VAX 
VAX 
IEEE 
IEEE 


VAX 
VAX 
IEEE 
IEEE 


VAX 
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Table 4—11 (Cont.): Floating-Point Operate Instructions Summary 


Mnemonic Operation 


Arithmetic Operations 


SUBG Subtract G_floating 
SUBS ~ Subtract S_ floating 
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Subset 


VAX 
IEEE 


SUBT Subtract T_floating IEEE 


4.10.1 Copy Sign 


Format: 


| CPYSy Fa.rq,Fb 1q,Fc.wq [Floating-point Operate format 


Operation: 


CASE 
CPYS: Fo «- Fav<63> || Fbv<62:0> 
CPYSN: Fe « NOT (Fav<63>) || Fbov<62:0> 
CPYSE: Fe + Fav<63:52> || Fbv<51:0> 
ENDCASE 
Exceptions: 
None 


Instruction mnemonics: 


CPYS Copy Sign 
CPYSE Copy Sign and Exponent 
CPYSN Copy Sign Negate 


Qualifiers: 


None 


Description: 


For CPYS and CPYSN, the sign bit of Fa is fetched (and complemented in the case 
of CPYSN) and concatenated with the exponent and fraction bits from Fb; the result 
is stored in Fc. 


For CPYSE, the sign and exponent bits from Fa are fetched and concatenated with 
the fraction bits from Fb; the result is stored in Fc. 


No checking of the operands is performed. 


Notes: 


e Register moves can be performed using CPYS Fx,Fx,Fy. Floating-point absolute 
value can be done using CPYS F31,Fx,Fy. Floating-point negation can be done 
using CPYSN Fx,Fx,Fy. Floating values can be scaled to a known range by using 
CPYSE. 


Instruction Descriptions (1) 4-83 


istribution 





4.10.2 Convert Integer to Integer 
Format: 
CVT xy Fb.rq,Fe WX | | 'Floating-point Operate format 


Operation: 


CASE | 
CVTQOL: Fe «+ Fbv<31:30> |] O<2:0> || 
Fov<29:0> {| 0<28:0> 
CVTLO: Fe «+ SEXT(Fbv<63:62> || Fbv<58:29>) 
ENDCASE 
Exceptions: 
Integer frv_. MNT ITTAINT 
Inveger - Overiiow, UV itl only 


Instruction mnemonics: 


CVTLQ Convert Longword to Quadword 
CVTQL Convert Quadword to Longword 


Qualifiers: 


Trapping: Software (/S) 
Integer Overflow Enable (/V) (CVTQL only) 


Description: 


The two’s-complement operand in register Fb is converted to a two’s-complement 
result and written to register Fc. 


The conversion from quadword to longword is a repositioning of the low 32 bits of 
the operand, with zero fill and optional integer overflow checking. Integer overflow 
occurs if Fb is outside the range —2**31..2**31-1. If integer overflow occurs, the 
truncated result is stored in Fc, and an arithmetic trap is taken if enabled. 


The conversion from longword to quadword is a repositioning of 32 bits of the 
operand, with sign extension. 


4-84 Common Architecture (I) 





4.10.3 Floating-Point Conditional Move 


Format: 


FCMOVxx Fa.rq,Fb.rq,Fc.wq 'Floating-point Operate format 


Operation: 


IF TEST(Fav, Condition_based_on_Opcode) THEN 


Fo + =Fbv 


Exceptions: 


None 


Instruction mnemonics: 


FCMOVEQ 
FCMOVGE 
FCMOVGT 
FCMOVLE 
FCMOVLT 
FCMOVNE 


Qualifiers: 


None 


Description: 


FCMOVE if Register Equal to Zero 

FCMOVE if Register Greater Than or Equal to Zero 
FCMOVE if Register Greater Than Zero 

FCMOVE if Register Less Than or Equal to Zero 
FCMOVE if Register Less Than Zero 

FCMOVE if Register Not Equal to Zero 


Register Fa is tested. If the specified relationship is true, register Fb is written to 
register Fc; otherwise, the move is suppressed and register Fc is unchanged. The 
test is based on the sign bit and whether the rest of the register is all zero bits, as 
described for floating branches in Section 4.9. 
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‘Notes: 
Except that it is likely in many sthmisabials to be substantially faster, the 
‘instruction: 


FCMOVxx Fa,Fb,Fe 


is exactly equivalent to: 


FByy Fa,label ; yy = NOT xx 
CPYS Fb,Fb,Fc | 
label: 
For example, a branchless sequence for: 
F1=MAX (F1, F2) 
is: 
CMPxLT F1,F2,F3 ! F3=one if F1<F2; x=F/G/S/T 
FCMOVNE F3,F2,F1 ! Move F2 to Fl if F1<F2 
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4.10.4 Move from/to Floating-Point Control Register 


Format: 
Mx_FPCR Fa.rq,Fa.rq,Fa.wq \Floating-point Operate format 
Operation: 


CASE 
MT FPCR: FPCR + Fav 
MF FPCR: Fa «-— FPCR 
ENDCASE 


Exceptions: 


None 


Instruction mnemonics: 


MF_FPCR Move from Floating-point Control Register 
MT_FPCR Move to Floating-point Control Register 


Qualifiers: 


None 


Description: 


The Floating-point Control Register (FPCR) is read from (MF_FPCR) or written 
to (MT_FPCR), a floating-point register. The floating-point register to be used is 
specified by the Fa, Fb, and Fc fields all pointing to the same floating-point register. 
If the Fa, Fb, and Fc fields do not all point to the same floating-point register, then 
it is UNPREDICTABLE which register is used. 


The use of these instructions and the FPCR are described in Section 4.7.7. 
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4.10.5 VAX Floating Add 
Format: 
ADDx Fa.rx,Fb.rx,Fe.wx !Floating-point Operate format 
Operation: 


Fe «+ Fav + Fbv 


Exceptions: 


Invalid Operation 
Overflow 
- Underflow 


Instruction mnemonics: 


ADDF Add F floating 
ADDG Add G_floating 
Qualifiers: 


Rounding: Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) 


Description: 
Register Fa is added to register Fb, and the sum is written to register Fe. 


The sum is rounded or chopped to the specified precision, and then the corresponding 
range is checked for overflow/underflow. The single-precision operation on canonical 
single-precision values produces a canonical single-precision result. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true 
zero (that is, VAX reserved operands and dirty zeros trap). The contents of Fc are 
UNPREDICTABLE if this occurs. See Section 4.7.5 for details of the stored result 
on overflow or underflow. 
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4.10.6 IEEE Floating Add 


Format: 


ADDx Fa.rx,Fb.rx,Fce.wx !Floating-point Operate format 


Operation: 


Foe «- Fav + Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 
Inexact Result 


Instruction mnemonics: 


ADDS | Add S_floating 
ADDT Add T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) 
Inexact Enable (/I) 
Description: 


Register Fa is added to register Fb, and the sum is written to register Fc. 


The sum is rounded to the specified precision, and then the corresponding range is 
checked for overflow/underflow. The single-precision operation on canonical single- 
precision values produces a canonical single-precision result. 
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An invalid operation trap is signaled if either operand has exp=0 and a non-zero 4 
fraction (IEEE denormals trap), or if exp=all-ones (IEEE NaNs and infinities trap). 


The contents of Fc are UNPREDICTABLE if this occurs. — 


See Section 4.7.5 for details of the stored result on overflow, underflow, or inexact 
result. 7 
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4.10.7 VAX Floating Compare 
_ Format: 
CMPGyy Fa.rg,Fb rg,Fe.wq , 'Floating-point Operate format 


Operation: 


IF Fav SIGNED RELATION Fov THEN 
Fe + 4000 0000 0000 000016 
ELSE 
Fe + 0000 0000 0000 000016 


Exceptions: 


Invalid Operation 


Instruction mnemonics: 


CMPGEQ Compare G_floating Equal 
CMPGLE Compare G_floating Less Than or Equal 
CMPGLT Compare G_floating Less Than 
Qualifiers: 
Trapping: Software (/S) 
Description: 


The two operands in Fa and Fb are compared. If the relationship specified by the 
qualifier is true, a non-zero floating value (0.5) is written to register Fc; otherwise, 
a true zero is written to Fc. 


Comparisons are exact and never overflow or underflow. Three mutually exclusive 
relations are possible: less than, equal, and greater than. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true 
zero (that is, VAX reserved operands and dirty zeros trap). The contents of Fc are 
UNPREDICTABLE if this occurs. 


Notes: 


e Compare Less Than A,B is the same as Compare Greater Than B,A; Compare 
Less Than or Equal A,B is the same as Compare Greater Than or Equal B,A. 
Therefore, only the less-than operations are included. 
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4.10.8 IEEE Floating Compare 


Format: 


CMPTyy Fa.rx,Fb.rx,Fc.wq | 'Floating-point Operate format 


Operation: 


IF Fav SIGNED RELATION Fbv THEN 


Ee 


ELSE 


4000 0000 0000 0000146 


Fe «— 0000 0000 0000 000016 


Exceptions: 


Invalid Operation 


Instruction mnemonics: 


CMPTEQ 
CMPTLE 
CMPTLT 
CMPTUN 


Qualifiers: 


Trapping: 


Description: 


Compare T_floating Equal 

Compare T_floating Less Than or Equal 
Compare T_floating Less Than 
Compare T_floating Unordered 


Software (/S) 


The two operands in Fa and Fb are compared. If the relationship specified by the 
qualifier is true, a non-zero floating value (2.0) is written to register Fc; otherwise, 
a true zero is written to Fc. 


Comparisons are exact and never overflow or underflow. Four mutually exclusive 
relations are possible: less than, equal, greater than, and unordered. The unordered 
relation is true if one or both operands are NaN. (This behavior must be provided 
by a software trap handler, since NaNs trap.) Comparisons ignore the sign of zero, 


so +0 = —0. 


An invalid operation trap is signaled if either operand has exp=0 and a non-zero 
fraction (IEEE denormals trap), or if exp=all-ones and a non-zero fraction (IEEE 
NaNs). The contents of Fc are UNPREDICTABLE if this occurs. 
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Comparisons with plus and minus infinity execute normally and do not take an 
invalid operation trap. \ This was added to support fast path selection through 
infinity testing in scientific codes. \ 


Notes: 


e Compare Less Than A,B is the same as Compare Greater Than B,A; Compare 
Less Than or Equal A,B is the same as Compare Greater Than or Equal B,A. 
Therefore, only the less-than operations are included. 
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4.10.9 Convert VAX Floating to Integer 
| For mat: 
CVTGQ _ Fb.rx,Fc.wq \Floating-point Operate format 


Operation: 


Fe «+ {conversion of Fbv} 


Exceptions: 


Invalid Operation | 
Integer Overflow 


Instruction mnemonics: 


CVTGQ Convert G. floating to Quadword 


Qualifiers: 


Rounding: Chopped (/C) 
Trapping: Software (/S) 
Integer Overflow Enable (/V) 


Description: 


The floating operand in register Fb is converted to a two’s-complement quadword 
number and written to register Fc. The conversion aligns the operand fraction with 
the binary point just to the right of bit zero, rounds as specified, and complements. 
the result if negative. 


An invalid operation trap is signaled if the operand has exp=0 and is not a true 
zero (that is, VAX reserved operands and ony zeros trap). The contents of Fc are 
UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on integer overflow. 
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4.10.10 Convert Integer to VAX Floating 


Format: | 

CVTQy Fb.rq,Fe wx | '\Floating-point Operate format 
Operation: 

Fe «+ {conversion of Fbv<63:0>} 
Exceptions: 

None 


Instruction mnemonics: 


CVTQF Convert Quadword to F floating 
CVTQG Convert Quadword to G_floating 


Qualifiers: 
Rounding: Chopped (/C) 


Description: 


The two’s-complement quadword operand in register Fb is converted to a single- 
or double-precision floating result and written to register Fc. The conversion 
complements a number if negative, normalizes it, rounds to the target precision, 
and packs the result with an appropriate sign and exponent field. 
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4.10.11 Convert VAX Floating to VAX Floating 
Format: 
CVT xy Fb.rx,Fe.wx - | 'Floating-point Operate format 
Operation: 
Fe + {conversion of Fbv} 
Exceptions: 


- Invalid Operation 
Overflow 
Underflow 


Instruction mnemonics: 


CVTDG Convert D_floating to G_ floating 

CVTGD Convert G_fioating to D_floating 

CVTGF Convert G_floating to F_floating 
Qualifiers: 

Rounding: Chopped (/C) 

Trapping: Software (/S) 

Underflow Enable (/U) 

Description: 


The floating operand in register Fb is converted to the specified alternate floating 
format and written to register Fc. 


An invalid operation trap is signaled if the operand has exp=0 and is not a true 
zero (that is, VAX reserved operands and dirty zeros trap). The contents of Fc are 
UNPREDICTABLE if this occurs. 


See Section 4.7 .5 for details of the stored result on overflow or underflow. 


Notes: 


¢ The only arithmetic operations on D_floating values are conversions to and from 
G_floating. The conversion to G_floating rounds or chops as specified, removing 
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three fraction bits. The conversion from G_floating to D_floating adds three low- 
order zeros as fraction bits, then the 8-bit exponent range is checked for overflow 
/anderflow. 


The conversion from G. floating to F_floating rounds or chops to single precision, 
then the 8-bit exponent range is checked for overflow/underflow. 


No conversion from F_floating to G_floating is required, since F_floating values 
are always stored in registers as equivalent G_floating values. 


Instruction Descriptions (1) 4-97 





4.1 0.12 Convert IEEE Floating to Integer 
- Format: co | 
: CVTTQ > Fb.rx,Fe.wq - 7 | (Floating-point Operate format 
Operation: 
Fo + {conversion of Fbv} 
Exceptions: 


Invalid Operation 
Inexact Result 
Integer Overflow 


Instruction mnemonics: 


CVTTQ Convert T_floating to Quadword 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Integer Overflow Enable (/V) 
Inexact Enable (/1) 
Description: 


The floating operand in register Fb is converted to a two’s-complement number and 
written to register Fc. The conversion aligns the operand fraction with the binary 
point just to the right of bit zero, rounds as specified, and complements the result if 
negative. | | 


An invalid operation trap is signaled if either operand has exp=0 and a non-zero 
fraction (IEEE denormals trap), or if exp=all-ones (IEEE NaNs and infinities trap). 


The contents of Fe are UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on integer overflow and. inexact 
result. 7 
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rs 0.13 Convert oe to IEEE Floating | 
“Format: 
_ CVTQy Fb.rq,Fe.wx | 'Floating-point Operate format 
Operation: a 
Fe + {conversion of Fbv<63:0>} 
Exceptions: 
Inexact Result 


instruction mnemonics: 


CVTQS Convert Quadword to S_ floating 
CVTQT Convert Quadword to T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Inexact Enable (/1) 
Description: 


The two’s-complement operand in register Fb is converted to a single- or double- 
precision floating result and written to register Fc. The conversion complements 
a number if negative, normalizes it, rounds to the target precision, and packs the 
result with an appropriate sign and exponent field. 


See Section 4.7.5 for details of the stored result on inexact result. 
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| 4.10.14 Convert IEEE Floating to IEEE Floating 

Format: 

CVTTS Fb.rx,Fc.wx _ | 'Floating-point Operate format. | | 

Operation: 

Fo + {conversion of Fbv} 
Exceptions: 


Invalid Operation 
Overflow 
Underflow 


Inexact Kesult 


Instruction mnemonics: 


CVTTS Convert T_floating to S_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) 
Inexact Enable (/I) 
Description: 


The floating operand in register Fb is converted to the specified alternate floating 
format and written to register Fc. 


An invalid operation trap is signaled if either operand has exp=0 and a non-zero 
fraction (IEEE denormals trap), or if exp=all-ones (IEEE NaNs and infinities trap). 


The contents of Fc are UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on overflow, underflow, or inexact 
result. , | 
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Notes: 


¢ No conversion from S_floating to T_floating is required, since S floating values 
are always stored in registers as equivalent T_floating values. 
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4.10.15 VAX Floating Divide. 
Format: 
| DIVx Fa.rx,Fb.rx,Fc.wx | | Floating-point Operate format 
Operation: : 
| Fe + Fav / Fbv 
Exceptions: 


Invalid Operation 
Division by Zero 
Overflow 
Underfiow 


Instruction mnemonics: 


DIVF Divide F_floating 
DIVG Divide G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) © 
Description: 


The dividend operand in register Fa is divided by the divisor operand in register Fb, 
and the quotient is written to register Fc. 

The quotient is rounded or chopped to the specified precision and then the 
corresponding range is checked for overflow/underflow. The single-precision 


operation on canonical single-precision values produces a canonical single-precision 
result. 
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An invalid operation trap is signaled if either operand has exp=0 and is not a true 
zero (that is, VAX reserved operands and dirty zeros trap). The contents of Fc are 
UNPREDICTABLE if this occurs. 


A division by zero trap is signaled if Fbv is zero. The contents of Fc are 
UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on overflow or underflow. 
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4.10.16 IEEE Floating Divide 


Format: 
DIVx Fa.rx,Fb.rx,Fe.wx . us | (Floating-point Operate format 
Operation: 


Fo — Fav / Fbv 


Exceptions: 


Invalid Operation 
Division by Zero 
Overflow 
Underflow 


odis We te En Modi WF 


Inexact Result 


Instruction mnemonics: 


DIVS Divide S_floating 
DIVT Divide T_floating 
Qualifiers: 


Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) 
Inexact Enable (/I) 


Description: 


The dividend operand in register Fa is divided by the divisor operand in register Fb, 
and the quotient is written to register Fc. 


The quotient is rounded to the specified precision, and then the corresponding range 
is checked for overflow/underflow. The single-precision operation on canonical single- 
precision values produces a canonical single-precision result. 
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An invalid operation trap is signaled if either operand has exp=0 and a non-zero 
fraction (IEEE denormals trap), or if exp=all-ones (IEEE NaNs and infinities trap). 


The contents of Fc are UNPREDICTABLE if this occurs. 


A division by zero trap is signaled if Fbv is zero. The contents of Fe are 
UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on overflow, underflow, or inexact 
result. 
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4.10.17 VAX Floating Multiply 


4-106 


Format: | 7 
MULx Fa.rx,Fb.rx,Fe.wx | | Floating-point Operate ieeenae 
Operation: 
Fo « Fav * Fbv 
Exceptions: 


Invalid Operation - 
Overflow 
Underflow 


Instruction mnemonics: 


MULF Multiply F_floating 
MULG Multiply G_floating 
Qualifiers: 
Rounding: Chopped (/C) 
Trapping: Software (/S) | 
Underflow Enable (/U) 
Description: 


The multiplicand operand in register Fb is multiplied by the multiplier operand in 
register Fa, and the product is written to register Fc. 


The product is rounded or chopped to the specified precision, and then the 
corresponding range is checked for overflow/underflow. The single-precision 
operation on canonical single-precision values produces a canonical single-precision 
result. 


An invalid operation trap is signaled if either operand has exp=0 and is not a true 
zero (that is, VAX reserved operands and dirty zeros trap). The contents of Fc are 
UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on overflow or underflow. 
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4.10.18 IEEE Floating Multiply 


Format: 

MULx | Fa.rx,Fb.rx,Fc.wx {Floating-point Operate format 
Operation: 

Fo «+ Fav * Fbv 


Exceptions: 


Invalid Operation 
Overfiow 
Underflow 


Inexact Result 


Instruction mnemonics: 


MULS Multiply S_floating 
MULT Multiply T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Underflow Eenable (/U) 
Inexact Enable (/I) 
Description: 


The multiplicand operand in register Fb is multiplied by the multiplier operand in 
register Fa, and the product is written to register Fc. 


The product is rounded to the specified precision, and then the corresponding range 
is checked for overflow/underflow. The single-precision operation on canonical single- 
precision values produces a canonical single-precision result. 
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An invalid operation trap is signaled if either operand has exp=0 and a non-zero 
fraction (IEEE denormals trap), or if exp=all-ones (IEEE NaNs and infinities map), 


The contents of Fe are UNPREDICTABLE if this occurs. 


See Section 4.7.5 for details of the stored result on overflow, underflow, or inexact 
result. 
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4.10.19 VAX Floating Subtract 


Format: 


SUBx Fa.rx,Fb.rx,Fc.wx (Floating-point Operate format 


Operation: 


Fo + Fav - Fbv 


Exceptions: 


Invalid Operation 
Overflow 
Underflow 


Instruction mnemonics: 


SUBF Subtract F_floating 
SUBG Subtract G_floating 


Qualifiers: 


Rounding: Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) 


Description: 


The subtrahend sored in register Fb i is subtracted from the minuend paren in 
register Fa, and the difference is written to register Fc. 


The difference is rounded or chopped to the specified gestion: and then the 
corresponding range is checked for overflow/underflow. The single-precision 

- operation on canonical single-precision values produces a canonical single-precision 
result. 
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An invalid operation trap is signaled if either operand has exp=0 and is not a true zero (that. 
is, VAX reserved operands and dirty zeros trap). The contents of Fc are UNPREDICTABLE | 
if this occurs. 


See Section 4.7.5 for details of the stored result on overflow or underflow 


4~110 Common Architecture (I) 





- 4.10.20 IEEE Floating Subtract 


Format: 

- SUBx Fa.rx,Fb.rx,Fc.wx [Floating-point Operate-format 
Operation: 

Fo + Fav - Fbv 
Exceptions: © 


Invalid Operation 
Overflow 
Underflow 
Inexact Result 


Instruction mnemonics: 


SUBS Subtract S_floating 
SUBT Subtract T_floating 
Qualifiers: 
Rounding: Dynamic (/D) 
Minus infinity (/M) 
Chopped (/C) 
Trapping: Software (/S) 
Underflow Enable (/U) 
Inexact Enable (/1) 
Description: 


The subtrahend operand in register Fb is subtracted from the minuend operand in 
register Fa, and the difference is written to register Fc. 


The difference is rounded to the specified precision, and then the corresponding 
range is checked for overflow/underflow. The single-precision operation on canonical 
single-precision values produces a canonical single-precision result. 


An invalid operation trap is signaled if either operand has exp=0 and a non-zero 
fraction (IEEE denormals trap), or if exp=all-ones (IEEE NaNs and infinitiés trap). 
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The contents of Fe are UNPREDICTABLE if this occurs. 


See Section 4. 7. 5 for details of the stored result on overflow, caierdoe: or baaeaet 
result. 
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4.11 Miscellaneous Instructions 


Alpha provides the miscellaneous instructions shown in Table 4-12. 


Table 4-12: Miscellaneous Instructions Summary 
Mnemonic Operation 


CALL_PAL Call Privileged Architecture Library Routine 


FETCH Prefetch Data 

FETCH_M Prefetch Data, Modify Intent 
MB. Memory Barrier 

RPCC Read Process Cycle Counter 
TRAPB Trap Barrier 
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4.11.1 Call Privileged Architecture Library 


Format: 


CALL_PAL fncir — (PAL format 


Operation: 


{Stall instruction issuing until all 
prior instructions are guaranteed to 
complete without incurring exceptions. } 
{Trap to PALcode. } 


Exceptions: 


None 


Instruction mnemonics: 


CALL_PAL Call Privileged Architecture Library 


Qualifiers: 


None 


Description: 


The CALL_PAL instruction is not issued until all previous instructions are 
guaranteed to complete without exceptions. If an exception occurs, the continuation 
PC in the exception stack frame points to the CALL_PAL instruction. The CALL_ 
PAL instruction causes a trap to PALcode. 
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4.11.2 Prefetch Data 


Format: 


FETCHx 0(Rb.ab) Memory format 


Operation: 


va + {Rbv} 
{Optionally prefetch aligned 512-byte block surrounding va. } 


Exceptions: 


None 


Instruction mnemonics: 


FETCH Prefetch Data 
FETCH_M  Prefetch Data, Modify Intent 


Qualifiers: 


None 


Description: 


The virtual address is given by Rbv. This address is used to designate an aligned 
512-byte block of data. An implementation may optionally attempt to move all or 
part of this block (or a larger surrounding block) of data to a faster-access part of 
the memory hierarchy, in anticipation of subsequent Load or Store instructions that 
access that data. 


The FETCH instruction is a hint to the implementation that may allow faster 
execution. An implementation is free to ignore the hint. If prefetching is 
done in an implementation, the order of fetch within the designated block is 
UNPREDICTABLE. 


‘The FETCH_M instruction gives the additional hint that modifications (stores) to 
some or all of the data block are anticipated. 


No exceptions are generated by FETCHx. If a Load (or Store in the case of FETCH_ 
M) that uses the same address would fault, the prefetch request is ignored. It is 
UNPREDICTABLE whether a TB-miss fault is ever taken by FETCH«x. 
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IMPLEMENTATION NOTE 
Implementations are encouraged to take the TB-miss 
fault, then continue the prefetch. 


The programming model for effective use of FETCH and FETCH_M is given in 
Appendix A. | 


- SOFTWARE NOTE 

FETCH is intended to help software overlap memory 
latencies on the order of 100 cycles. FETCH is unlikely 
to help (or be implemented) for memory latencies on the 
order of 10 cycles. Code scheduling should be used to 
overlap such short latencies. 
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4.11.3 Memory Barrier 


Format: 


MB !Memory format 


Operation: 


{Guarantee that all subsequent loads or stores 
will not access memory until after all previous 
loads and stores have accessed memory, as 
observed by other processors. } 


Exceptions: 


None 


Instruction mnemonics: 


MB Memory Barrier 


Qualifiers: 


None 


Description: 


The use of the Memory Barrier (MB) instruction is required only in multiprocessor 
systems. 


In the absence of an MB instruction, loads and stores to different physical locations 

are allowed to complete out of order on the issuing processor as observed by other 

processors. The MB instruction allows memory accesses to be serialized on the 

issuing processor as observed by other processors. See Chapter 5 for details on using 
_ the MB instruction to serialize these accesses. Chapter 5 also details coordinating 
- memory accesses across processors. 


Note that MB ensures serialization only; it does not necessarily accelerate the 
progress of memory operations. | 
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4.11.4 Read Process Cycle Counter 


Format: 


RPCC Ra.wq {Memory format 


Operation: 


Ra « {cycle counter} 


Exceptions: 


None 


Instruction mnemonics: 


RPCC Read Process Cycle Counter 


Qualifiers: 


None 


Description: 


Register Ra is written with the process cycle counter (PCC). 


The low-order 32 bits of the process cycle counter is an unsigned 32-bit integer that 
increments once per N CPU cycles, where N is an implementation-specific integer in 
the range 1..16. The cycle counter frequency is the number of times the process cycle 
counter gets incremented per second, rounded to a 64-bit integer. The integer count 
wraps to 0 from a count of FFFF FFFF,,. The counter wraps no more frequently than 
1.5 times the implementation’s interval clock interrupt period (which is two thirds 
of the interval clock interrupt frequency). The high-order 32 bits of the process cycle 
counter are an offset that when added to the low-order 32 bits gives the cycle count 
for this process. 


The process cycle counter is suitable for timing intervals on the order of nanoseconds 
and may be used for detailed performance characterization. It is required on all 
implementations. PCC is required for every processor, and each processor in a 
multiprocessor system has its own private, independent PCC. 


\INTERNAL IMPLEMENTATION NOTE 
An implementation-dependent mechanism must exist 
that, when enabled, causes the RPCC instruction always 
to return a zero in Ra. This mechanism must be usable 
by privileged system software. \ 
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As an example, consider the following code that returns in RO the current cycle count 
MOD 2**32. 


RPCC RO ; Read the process cycle counter 

SLL RO, #32, Rl ; line up the offset and count fields 
ADDQ RO, R1, RO ; do add 

SRL RO, #32, RO ; zero extend the cycle count to 64 bits 
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4.11.5 Trap Barrier 


Format: 


TRAPB {Memory format 


Operation: 


{Stall instruction issuing until all prior instructions are 
guaranteed to complete without incurring arithmetic traps. } 


Exceptions: 


None 


Instruction mnemonics: 


TRAPB _ Trap Barrier 


Qualifiers: 


None 


Description: 


The TRAPB instruction allows: software to guarantee that in a pipelined 
implementation, all previous arithmetic instructions will complete without incurring 
any arithmetic traps before any instructions after the TRAPB are issued. For 
example, TRAPB should be used before changing an exception handler to ensure 
that all exceptions on previous instructions are processed in the current exception- 
handling environment. | 
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4.12 VAX Compatibility Instructions 


Alpha provides the instructions shown in Table 4—13 for use in translated VAX code. 
These instructions are not a permanent part of the architecture and will not be 
available in some future implementations. They are intended to preserve customer 
assumptions about VAX instruction atomicity in porting code from VAX to Alpha. 


NOTE | 
\They will be removed, and not emulated, after the first 


two full generations of Alpha implementations, that is, 
about 1995. \ | 


These instructions should be generated only by the VAX-to-Alpha software 
translator; they should never be used in native Alpha code. Any native code that 
uses them may cease to work. 


Table 4-13: VAX Compatibility Instructions Summary 
Mnemonic Operation 


RC Read and Clear 
RS Read and Set 
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4.12.1 VAX Compatibility Instructions 


Format: 

Rx Ra.wq : {Memory format 
Operation: 

Ra + intr flag 

intr flag « 0 'RC » 

intr flag + 1 'RS 
Exceptions: 

None 


Instruction mnemonics: 


RC Read and Clear 
RS Read and Set 
Qualifiers: 
None 
Description: 


The intr_flag is returned in Ra and then cleared to zero (RC) or set to one (RS). 


These instructions may be used to determine whether the sequence of Alpha 
instructions between RS and RC (corresponding to a single VAX instruction) was 
executed without interruption or exception. 


Intr_flag is a per-processor state bit. The intr_flag is cleared if that processor 
encounters a CALL_PAL REI instruction. 


It is UNPREDICTABLE whether a processor’s intr_flag is affected when -that 
processor executes an LDx_L or STx_C instruction. A processor’s intr_flag is not 
affected when that processor executes a normal load or store instruction. 


A processor’s intr_flag is not affected when that processor executes a taken branch. 
NOTE | 
These instructions are intended only for use by the VAX- 


to-Alpha software translator; they should never be used 
by native code. 
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4.13 \REVISION HISTORY 
Revision 5.0, May 12, 1992 


1 
y) 
3 
4. 
5 
6 


added eco #41 to LDx_C and format style change 


. Changed DRAINT to TRPB 


Converted to SDML 
Modifed description of MULQ to spec. operands and result are signed 


. Removed FCMOV and CVTLQ from instructions that set FPCR bits 


Changed byte mask for INSxx and MSKxx instructions to 16 bit value 


Revision 4.0, March 29, 1991 


1. 
2. 
3. 


de 


o eS SS 


11. 


12. 
13. 
14. 


15. 


16. 


Added Scaled Add and Subtract 
Added FPCR register and accompanying text 


Bits <13:0> of branch meplarenient field in RET and JSR_COROUTINE reserved 
to Digital software 


Removed references to D floating point 


Clarified floating-point subset requirements and added OpenVMS requirements 
for FP regs and T_floating memory ops in implementation without floating-point 


support 

Make TEST a dyadic operator with explicit condition argument 
Fix ADDQ to allow literal as second operand, not first 

Add format type to Arithmetic and Logical and shift Instructions 


Rename operator ARITH_SHIFT to ARITH RIGHT SHIFT and upgrade 
description 


. Add description of how to derive upper 64 bits of product (using UMULH) to 


MULQ description 


Add requirement that F_, D_, and G_floating operate Instructions materialize a 
true zero 


Clarify expressions for MAX F_, D_, G_, S_, and T_ values 
Reorder special values table in floating-point encodings section 


Modify MB description to indicate that MB works only on instructions from 
issuing processor 


Disambiguate between instances when floating disabled faults and illegal 
instruction traps are taken 


Clarify that low order bits are returned on integer overflow arithmetic conversion 
traps 


Instruction Descriptions (I) 4-123 





restricted 





Distribution 


17. 
18. 
19. 


20. 
2i. 


22. 
23. 
24. 


20. 
26. 


27. 
28. 
29. 


Add description to STx_C Instruction that clarifies implementation requirements 
for execution of STxC Instruction 


Correct decimal value given for MIN T_floating 
Impose uniform usage of CASE pseudocode construct 
Insert spaces into long hex and binary values to improve legibility 


Added optimized sign-extended byte load code fragment to code examples in 
Extract Byte Instruction description 


Clarify use and significance of X+C notation in code examples for Extract Byte 
Instruction 


Clarify note describing how a Read For Ownership cache coherency protocol can 


affect LDx_L/STx_C sequence 


Change reference in Floating-Point Operate Format Instructions from floating- 
point arithmetic operations’ to floating-point operate Instructions’ 


Rename RCC instruction to Read Process Cycle Counter’ and modify definition 


Changed values of displacement bits <13:0> in Jump To Subroutine’ instruction 
to indicate that all values from 0010 to 1111,, are reserved to Digital 


Removed text in Longword Add instruction that described carry detection 
Specified overflow bits returned for Longword Mulutiply 
Removed text in Longword Subtract instruction that described carry detection 


Revision 3.0, March 2, 1990 
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Rename GOTO to BR, and JSRs to JMP, JSB, RET 
Rename MSKxx to ZAPxx 

Remove CVTFQ, and CMPFxx 

Remove CVT float-to-longword; add CVTQL/LQ 
Make non-canonical longword +-* well-defined 
Rename memory-format JSR to BSR 

Add VAX compatibility Instructions RC, RS 

Add Fetch and Fetch_M 


Add low bit set and clear cmoves 


. Remove Nudge 
. Add longword lock Instructions | 
. Remove longword load address Instructions 


: e 
. Add qguadword lead address high 





14, Rework the LDx/L description 
15. Change EXTxx/INSxx back to V1.0 SRM EXTxx/INSxx/MRGxx 
16. Change floating-point exception behavior back to V1.0 SRM behavior 


Revision 2.0, October 4, 1989 


oOo PN ower & NY PF 


Add TLE provided comment on emulation of Instructions 
Change shift range from 0..64 to 0..63 

Remove FASx, SWP, FREEZE, THAW Instructions 

Add load lock and store conditional Instructions 

Remove WAIT/WAITF Instructions 

Change DRAIN to DRAINT and only drain for arithmetic traps 
Add memory barrier and nudge Instructions 

Rework Floating-point exceptions 

Add cycle counter 


Revision 1.0, May 23, 1989 


BS US Oe oS Ye 


Rework Floating-point to be unmoded 

Remove subsetting of integer MUL 

Remove integer DIV 

Add Freeze and Thaw 

Rename Lock/Unlock to SWP and FASx and remove long version of lock 
Add conditional move 

Add branch on low bit branches (BLBS/BLBC) 

Add WAIT/WAITF Instructions 


Revision 0.0, March 15, 1989 


1. 


Initial Version 
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Chapter 5 
System Architecture and Programming Implications 


(I) 


5.1 Introduction 


Portions of the Alpha architecture have implications for programming, and 
the system structure, of both uniprocessor and multiprocessor implementations. 
Architectural implications considered in the following sections are: 


e Physical memory behavior 

¢ Caches and write buffers 

° Translation buffers and virtual caches 

e Data sharing 

¢ Read/write ordering 

e Stacks 

e Arithmetic traps 

To meet the requirements of the Alpha architecture, software and hardware 


implementors need to take these issues into consideration. 


5.2 Physical Memory Behavior 


Alpha physical memory space is divided into four regions, based on the two most 
significant, implemented, physical address bits. Each region’s behavior can be 
described in terms of its coherency, granularity, width, and memory-like behavior. 


5.2.1 Coherency of Memory Access 


Alpha implementations must provide a coherent view of memory, in which each write 
by a processor or I/O device (hereafter, called “processor”) becomes visible to all other 
processors. No distinction is made between coherency of “memory space” and “I/O 
space”. 


Memory coherency may be provided in different Waye, 5% each of the four physical 
address regions. 


Possible per-region policies include, but are not restricted to: 
1. No caching 


No copies are kept of data in a region; all reads and writes access the actual data 
location (memory or I/O register). 
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2. Write-through caching 


Copies are kept of any data in the region; reads may use the copies, but writes 
update the actual data location and either update or invalidate all copies. 


3. Write-back caching 


Copies are kept of any data in the region; reads and writes may use the copies, 
and writes use additional state to determine whether there are other copies to 
invalidate or update. 


Part of the coherency policy implemented for a given physical address region may 
include restrictions on excess data transfers (performing more accesses to a location 
than is necessary to acquire or change the location’s value), or may specify data 
transfer widths (the granularity used to access a location). 


Independent of coherency policy, a processor may use different hardware or different 
hardware resource policies for caching or buffering different physical address 
regions. 


@éu 2 ain in Ain in 


For each region, an implementation must support aligned quadword access and may 
optionally support aligned longword access. 


For a quadword access region, accesses to physical memory must be implemented 
such that independent accesses to adjacent aligned quadwords produce the same 
results regardless of the order of execution. Further, an access to an aligned 
quadword must be done in a single atomic operation. 


For a longword access region, accesses to physical memory must be implemented 
such that independent accesses to adjacent aligned longwords produce the same 
results regardless of the order of execution. Further, an access to an aligned 
longword must be done in a single atomic operation, and an access to an aligned 
quadword must also be done in a single atomic operation. 


In this context, “atomic” means that if different processors do simultaneous reads 
and writes of the same data, it must not be possible to observe a partial write of the 
subject longword or quadword. 


5.2.3 Width of Memory Access 


Subject to the granularity, ordering, and coherency constraints given in Sections 
5.2.1, 5.2.2, and 5.6, accesses to physical memory may be freely cached, buffered, 
and prefetched. 


A processor may read more physical memory data (such as a full cache block) than 
is actually accessed, writes may trigger reads, and writes may write back more data 
than is actually updated. A processor may elide multiple reads and/or writes to the 
same data. 
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5.2.4 Memory-Like Behavior 


A memory-like region obeys the following rules: 


Each page frame in the region either exists in its entirety or does not exist in its 
entirety; there are no holes within a page frame. 


All locations that exist are read/write. 


A write to a location followed by a read from that location returns precisely the 
bits written; all bits act as memory. 


A write to one location does not change any other location. 
Reads have no side effects. 

Longword access granularity is provided. 

Instruction-fetch is supported. 


Load-locked and store-conditional are supported. 


Non-memory-like regions may have much more arbitrary behavior: 


Unimplemented locations or bits may exist anywhere. 
Some locations or bits may be read-only and others write-only. 


Address ranges may overlap, such that a write to one location changes the bits 
read from a different location. 


Reads may have side effects, although this is strongly discouraged. 
Longword granularity need not be supported. 

Instruction-fetch need not be supported. 

Load-locked and store-conditional need not be supported. 


HARDWARE/SOFTWARE COORDINATION NOTE 

The details of such behavior are outside the scope 
of the Alpha architecture. Specific processor and 
V/O device implementations may choose and document 
whatever behavior they need. It is the responsibility of 
system designers to impose enough consistency to allow 
processors successfully to access matching non-memory _ 
devices in a coherent way. 


5.3 Translation Buffers and Virtual Caches 


A system may choose to include a a virtual instruction cache (virtual I-cache) or a 
virtual data cache (virtual D-cache). A system may also choose to include either 
a combined data and instruction Translation Buffer (TB) or separate data and 
instruction TBs (DTB and ITB). The contents of these caches and/or translation 
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buffers may become invalid, depending on what operating system activity is being 
performed. 


Whenever a nonsoftware field of a valid Page Table Entry (PTE) is modified, copies - 
of that PTE must be made coherent. PALcode mechanisms are available to clear all 
TBs, both DTB and ITB entries for a given VA, either DTB or ITB entries for a given 
VA, or all entries with the Address Space Match (ASM) bit clear. Virtual D-cache 
entries are made coherent whenever the corresponding DTB entry is requested to 
be cleared by any of the appropriate PALcode mechanisms. Virtual I-cache entries 
can be made coherent via the CALL_PALL IMB instruction. 


If a processor implements address space numbers (ASNs), and the old PTE has 
the address space match (ASM) bit clear (ASNs in use) and the valid bit set, then 
entries can also effectively be made coherent by assigning a new, unused ASN to 
the currently running process and not reusing the previous ASN before calling the 
appropriate PALcode routine to invalidate the Translation Buffer (TB). 


In a multiprocessor environment, making the TBs and/or caches coherent on only 
one processor is not always sufficient. An operating system must arrange to perform 
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data for any affected page. 
5.4 Caches and Write Buffers 


A hardware implementation may include mechanisms to reduce memory access time 
by making local copies of recently used memory contents (or those expected to be 
used) or by buffering writes to complete at a later time. Caches and write buffers are 
examples of these mechanisms. They must be implemented so that their existence 
is transparent to software (except for timing, error reporting/control/recovery, and 
modification to the I-stream).. 


The following requirements must be met by all cache/write-buffer implementations. 
All processors must provide a coherent view of memory. 


1. Write buffers may be used to delay and aggregate writes. From the viewpoint 
of another processor, buffered writes appear not to have happened yet. (Write 
buffers must not delay writes indefinitely. See Section 5.6.1.9.) 


2. Write-back caches must be able to detect a later write from another processor 
and invalidate or update the cache contents. : | 


3. A processor must guarantee that a data store to a location followed by a data 
load from the same location must read the updated value. | | 


4. Cache prefetching is allowed, but virtual caches must not prefetch from invalid — 
pages. 


5. A processor must guarantee that all of its previous writes are visible to all other 
processors before a HALT instruction completes. A processor must guarantee 
that its caches are coherent with the rest of the system before continuing from 
a HALT. 
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6. If battery backup is supplied, a processor must guarantee that the memory 
system remains coherent across a powerfail/recovery sequence. Data that was 
written by the processor before the powerfail may not be lost, and any caches 
must be in a valid state before ( and if) normal instruction processing is continued 
after power is restored. 


7. Virtual instruction caches are not required to notice modifications of the virtual 
I-stream (they need not be coherent with the rest of memory). Software that 
creates or modifies the instruction stream must execute a CALL_PAL IMB before 
trying to execute the new instructions. 


For example, if two different virtual addresses, VA1 and VA2, map to the same 
page frame, a store to VAl modifies the virtual I-stream fetched via VA2. 


However, the sequence: 

1. Change the mapping of an I-stream page from valid to invalid, then 

2. Copy the corresponding page frame to a new page frame, then © 

3. Change the original mapping to be valid and point to the new page frame 
does not modify the virtual I-stream (this might happen in soft page faults). 


8. Physical instruction caches are not required to notice modifications of the 
physical I-stream (they need not be coherent with the rest of memory), except for 
certain paging activity. (See Section 5.6.1.9.) Software that creates or modifies 
the instruction stream must execute a CALL_PAL IMB before trying to execute 
the new instructions. 


In this context, to “modify the physical I-stream” means any Store to the same 
physical address that is subsequently fetched as an instruction. _ 


In this context, to “modify the virtual I-stream” means any Store to the same physical 
address that is subsequently fetched as an instruction via some corresponding 
(virtual address, ASN) pair, or to change the virtual-to-physical address mapping 
so that different values are fetched. 


5.5 Data Sharing 


In a multiprocessor environment, writes to shared data must be synchronized by the 
programmer. 


5.5.1 Atomic Change of a Single Datum 


The ordinary STL and STQ instructions can be used to perform an atomic change 
of a shared aligned longword or quadword. (“Change” means that the new value is 
not a function of the old value.) In particular, an ordinary STL or STQ instruction 
can be used to change a variable that could be simultaneously accessed via an LDx_ 
L/STx_C sequence. 
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5. 5.2 Atomic Update of a Single Datum 


The load-locked/store-conditional insmrieeens may be used to perform an atomic — 
update of a shared aligned longword or quadword. (“Update” means that the new 
value is a function of the old value. ) 


The following sequence performs a read-modify-write aperation on location x. Only 
BOB vere regieter operate instructions and branch fall-throughs may occur in the 
sequence: 


try again: | 
LDQ L R1,x 
<modify R1> 
STQ C R1,x 


BEQ R1,no_store 

no store: 
<code to check for excessive iterations> 
BR try again 


FTralt 


if this sequence runs with no exceptions or interrupts, and no other processor writes 
to location x (more precisely, the locked range including x) between the LDQ_L and 
STQ_C instructions, then the STQ_C shown in the example stores the modified value 
in x and sets R1 to 1. If, however, the sequence encounters exceptions or interrupts 
that eventually continue the sequence, or another processor writes to x, then the 
STQ_C does not store and sets R1 to 0. In this case, the sequence is repeated via 
the branches to no_store and try_again. This repetition continues until the reasons 
for exceptions or interrupts are removed, and no interfering store is encountered. 


To be useful, the sequence must be constructed so that it can be replayed an arbitrary 
number of times, giving the same result values each time. A sufficient (but not 
necessary) condition is that, within the sequence, the set of operand destinations 
and the set of operand sources are disjoint. 


NOTE 
A autheieatly long instruction sequence between LDQ_ 
L and STQ_C will never complete, because periodic 
timer interrupts will always occur before the sequence 
completes. The rules in Appendix A _ describe 
sequences that will eventually complete in all Alpha 
implementations. 


This load-locked/store-conditional paradigm may be used whenever an atomic update 
of a shared aligned quadword is desired, including getting the effect of atomic byte 
writes. 


5.5.3 Atomic Update of Data Structures 


Before accessing shared writable data structures (those that are not a single aligned 
longword or quadword), the programmer can acquire control of the data structure 
by using an atomic update to set a software lock variable. Such a software lock can 
be cleared with an ordinary store instruction. 
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A software-critical section, therefore, may look like the sequence: 


stq_c_ loop: 
spin_ loop: 
LDQ L R1,lock_variable \ 
BLBS Rl,already set \ 
OR R1,#1,R2 > Set lock bit 
STQ C R2,lock_variable / 
BEQ R2,stq_c fail ; 
MB 
<critical section: updates various data structures> 
MB 


STQ R31,lock_ variable ; Clear lock bit 


already set: 


<code to block or reschedule or test for too many iterations> 
BR spin_loop 


stq_c_ fail: 


<code to test for too many iterations> 
BR stq_c_loop 


This code has a numberof subtleties: 


1. 


8. 
9. 
_ It would be a performance mistake to spin-wait by repeating the full LDQ_L..STQ_C 


If the lock_variable is already set, the spin loop is done without doing any stores. 
This avoidance of stores improves memory subsystem performance and avoids 
the deadlock described below. | 


If the lock_variable is actually being changed from 0 to 1, and the STQ_C fails 
(due to an interrupt, or because another processor simultaneously changed lock_ 
variable), the entire process starts over by reading the lock_variable again. 


Only the fall-through path of the BLBS does a STx_C; some implementations 
may not allow a successful STx_C after a branch-taken. 


Only register-to-register operate instructions are used to do the modify. 


Both conditional branches are forward branches, so they are properly predicted 
not to be taken (to match the common case of no contention for the lock). 


The OR writes its result to a second register; this allows the OR and the BLBS 
to be interchanged if that would give a faster instruction schedule. 


Other operate instructions (from the critical section) may be scheduled into 
the LDQ_L..STQ_C sequence, so long as they do not fault or trap, and they 
give correct results if repeated; other memory or operate instructions may be 
scheduled between the STQ_C and BEQ. 


The MB instructions are discussed in Section 5.5.4. 
An ordinary STQ instruction is used to clear the lock_variable. 


sequence (to move the BLBS after the BEQ) because that sequence may repeatedly 
change the software lock_variable from “locked” to “locked,” with each write causing 
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extra access delays in all other caches that Sentai the lock_ variable. In the extreme, 
spin-waits that contain writes may deadlock as follows: 


If, when one processor spins with writes, another processor is modifying (not 
changing) the lock_variable, then the writes on the first processor may cause me 
STx_C of the modify on the second processor always to fail. 


This deadlock situation is avoided by: 
e Having only one processor do a store (no STx_C), or 
e Having no write in the spin loop, or 


° Doing a write only if the shared variable actually changes state (1. — 1 does not 
| change state). — 


5.5.4 Ordering Considerations for Shared Data Structures 


A critical section sequence, such as shown in Section 5.5.3, is conceptually only three 
steps: | 


1 A Sa Lae wan mn aLhowen a 


a. satQuire Ssoivbware lock | 
2. Critical section—read/write shared data 
8. Clear software lock 


In the absence of explicit eee to the contrary, the Alpha architecture allows — 
reads and writes to be reordered. While this may allow more implementation speed 
and overlap, it can also create undesired side effects on shared data structures. 
Normally, the critical section just described would have two instructions added to it: 


<acquire software lock> 

MB (memory barrier #1) 

<critical section -- read/write shared data> 
MB (memory barrier #2) 

<clear software lock> 


The first memory barrier prevents any reads (from within the critical section) from 
being prefetched before the software lock is acquired; such prefetched reads would 
potentially contain stale data. 


The second memory barrier prevents any reads or writes.(from within the critical 
section) from being delayed past the clearing of the software lock; such delayed 
aecesses could interact with the next user of the shared data, defeating the purpose 
of the software lock entirely. 


SOFTWARE NOTE 
In the VAX architecture, many instructions provide non- 
interruptable read-modify-write sequences to memory 
variables. Most programmers never regard data sharing 
as an issue. 


In the Alpha gehitaciea: nraocrammers mus ay more 


p_mA_ at _F en em OF eer Nol ee Nene Mee r- ~— oe at eth £2444 AAS eh We 


“J 
attention to synchronizing access to shared a ata; for 
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example, to AST routines. In the VAX, a programmer 
can use an ADDL2 to update a variable that is shared 
between a “MAIN” routine and an AST routine, if 
running on a single processor. In the Alpha architecture, 
a programmer must deal with AST shared data by using 
multiprocessor shared data sequences. 


5.6 Read/Write Ordering 


This section does not apply to programs that run on a single processor and do not 
write to the instruction stream. On a single processor, all memory accesses appear 
to happen in the order specified by the programmer. This section deals entirely with 
predictable read/write ordering across multiple processors. 


The order of reads and writes done in an Alpha implementation may differ from that 
specified by the programmer. 


For any two memory references A and B, either A must occur before B in all Alpha 
implementations, B must occur before A, or they are UNORDERED. In the last 
case, software cannot depend upon one occurring first: the order may vary from 
implementation to implementation, and even from run to run or moment to moment 
on a single implementation. 


If two references cannot be shown to be ordered by the rules given, they are 
UNORDERED and implementations are free to do them in any order that is 
convenient. Implementations may take advantage of this freedom to deliver 
substantially higher performance. 


The discussion that follows first defines the architectural issue sequence of memory 
references on a single processor, then defines the (partial) ordering on this issue 
sequence that ail Alpha implementations are required to maintain. 


The individual issue sequences on multiple processors are merged into access 
sequences at each shared memory location. The discussion defines the (partial) 
ordering on the individual access sequences that all Alpha implementations are 
required to maintain. 


The net result is that for any code that executes on multiple processors, one can 
determine which memory accesses are required to occur before others on all Alpha 
implementations and hence can write useful shared-variable software. 


Software writers can force one reference to occur before another by inserting a 
memory barrier instruction (MB or IMB) between the references. 


5.6.1 Alpha Shared Memory Model 


An Alpha system consists of a collection of processors and shared coherent memories 
that are accessible by all processors. (There may also be unshared memories, but 
they are outside the scope of this section.) 
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NOTE 
\ Unshared example: On the PMI, some physical 


addresses in I/O space access unshared processor-local 
CSRs. \ 


A processor is an Alpha CPU or an I/O device (or anything else that gets added). 
A shared memory is the primary storage place for one or more locations. 


A location is an aligned quadword, specified by its physical address. Multiple virtual 
addresses may map to the same physical address. adenine considerations are based 
only on the physical address. 


IMPLEMENTATION NOTE 
An implementation may allow a location to have 
multiple physical addresses, but the rules for accesses 
via mixtures of the addresses are implementation- 
specific and outside the scope of this section. Accesses 
via exactly one of the physical addresses follow the rules 
described next. 


Each processor may generate accesses to shared memory locations. There are five 
types of accesses: 


Instruction fetch by processor i to location x, returning value a, denoted Pi:I(x,a). 
Data read by processor i to location x, returning value a, denoted Pi:R(x,a). 
Data write by processor i to location x, storing value a, denoted Pi:W(x,a). 


Memory barrier instruction issued by processor i, denoted Pi:MB. 


+e SS YS oF 


I-stream memory barrier instruction issued by processor i, denoted Pi:IMB. 


The first access type is also called an I-stream access or I-fetch. The next two are 
also called D-stream accesses. The first three types collectively are called read/write 
accesses, denoted Pi:*(x,a). The last two types collectively are called barriers. 


During actual execution in an Alpha system, each processor has a time-ordered issue 
sequence of all the memory references presented by that processor (to all memory 
locations), and each location has a time-ordered access sequence of all the accesses 
presented to that location (from all processors). 


5.6.1.1 Architectural Definition of Processor Issue Sequence 


The issue sequence for a processor is architecturally defined with respect to a 
hypothetical simple implementation that contains one processor and a single shared 
memory, with no caches or buffers. This is the instruction execution model: 


1. I-fetch: An Alpha instruction is fetched from memory. 


2. Read/Write: That instruction is executed and runs to completion, including a 
single data read from memory for a Load instruction or a single data write to 
memory for a Store instruction. 
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5.6.1.2 


§.6.1.3 


3. Update: The PC for the processor is updated. 
4. Loop: Repeat the above sequence indefinitely. 


If the instruction fetch step gets a memory management fault, the I-fetch is not done 
and the PC is updated to point to a PALcode fault handler. If the read/write step 
gets a memory management fault, the read/write is not done and the PC is updated 
to point to a PALcode fault handler. 


All memory references are aligned quadwords. For the purpose of defining ordering, 
aligned longword references are modeled as quadword references to the containing 
aligned quadword. 

Definition of Processor Issue Order 


A partial ordering, called processor issue order, is imposed on the issue sequence 
defined in Section 5.6.1.1. 


For two accesses u and vu issued by processor Pi, u is said to PRECEDE v IN ISSUE 
ORDER (<) if u occurs earlier than v in the issue sequence for Pi, and either of the 
following applies: 


1. The access types are of the following issue order: 


Table 5-1: Processor Issue Order | 
Ist|/2nd— Pi:l(y,b) Pi:R(y,b) Pi:W(y,b) Pi:MB Pi:IMB 


Pi:I(x,a) < if x=y <ifx=y < < 
Pi:R(x,a) <ifx=y <ifx=y < < 
Pi:W(x,a) <ifx=y <ifx=y < < 
Pi:MB < < < < 
Pi: IMB < < < < < 


2. Or, u isa TB fill, for example, a PTE read in order to satisfy a TB miss, and v is 
an I- or D-stream access using that PTE (see Section 5.6.2). 


Issue order is thus a partial order imposed on the architecturally specified issue 
sequence. Implementations are free to do memory accesses from a single processor 
in any sequence that is consistent with this partial order. 


Note that accesses to different locations are ordered only with respect to barriers 
and TB fill. The table asymmetry for I-fetch allows writes to the I-stream to be 
incoherent until an IMB is executed. 


Definition of Memory Access Sequence 


The access sequence for a location cannot be observed directly, nor fully 
predicted before an actual execution, nor reproduced exactly from one execution 
to another. Nonetheless, some useful ordering properties must hold in all Alpha 
implementations. 
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5.6.1.4 


5.6.1.5 


5.6.1.6 


Definition of Location Access Order 


A partial ordering, called location access order, is imposed on the memory access 
sequence defined above. _ | 


For two accesses u and v to location x, Uis said to PRECEDE v IN ACCESS ORDER 
(<) if u occurs earlier than vu in the access sequence for x, and at least one of them 
is a write: 

Table 5-2: Location Access Order 

Ist|/2nd— Pi:l(x,b) Pi:R(x,b) Pi:W(x,b) 


Pi:I(x,a) | | < 
Pi:R(x,a) < 
Pi:W(x,a)  < << < 


Access order is thus a partial order imposed on the actual access sequence for a 
given location. Each location has a separate access order. There is no direct ordering 


LCLALIVIISLIILY VOCUWCELL ALUCSSOES W ULLITICUL LULALLULIS. 


Note that reads and I-fetches are ordered only with respect to writes. 


Definition of Storage 


If u is Pi:W(x,a), and v is either Pj:1(x,b) or Pj:R(x,b), and u<v, and no w Pk:W(x,c) 
exists such that u<w<v, then the value 6 returned by v is exactly the value a 
written by wu. 


Conversely, if u is Pi:W(x,a), and vu is either Pj:I(x,b) or Pj:R(x,b), and b=a (and a is 
distinguishable from values written by accesses other than uw), then u<v and for any 
other w Pk:W(x,c) either w<u or v<w. 


The only way to communicate information between different processors is for one to 
write a shared location and the other to read the shared location and receive the 
newly written value. (In this context, the sending of an interrupt from processor 
Pi to processor Pj is-modeled as Pi writing to a location INTy, and Pj reading from 


INT.) 


Relationship Between Issue Order and Access Order 


If u is Pi:*(x,a), and v is Pi:*(x,b), one of which is a write, and u<v in the issue order 
for processor Pi, then u<v in the access order for location x. 


In other words, if two accesses to the same location are ordered on a given processor, 


_ they are ordered in the same way at the location. 


5.6.1.7 Definition of Before 


For two accesses u and v, u is said to be BEFORE v (<) if: 


u<vor 
u<vor 
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there exists an access w such that: 


(u<wandw <v)or 
(u < w and w <— v). 


In other words, “before” is the transitive closure over issue order and access order. 
§.6.1.8 Definition of After 

If u <— v, then v is said to be AFTER u. 

At most one of u < v and v < u is true. 


5.6.1.9 Timeliness 


Even in the absence of a barrier after the write, a write by one processor to a given 
location may not be delayed indefinitely in the access order for that location. 


5.6.2 Litmus Tests 


Many issues about writing and reading shared data can be cast into questions about 
whether a write is before or after a read. These questions can be answered by 
rigorously applying the ordering rules described previously to demonstrate whether 
the accesses in question are ordered at all. 


_ Assume, in the litmus tests below, that initially all memory locations contain 1. 


§.6.2.1 Litmus Test 1 (Impossible Sequence) 


Pi Pj 
[U1] Pi:W(x,2) [V1] Pj: R(x,2) 
[V2] Pj:R(x,1) 


V1 reading 2 implies Ul « V1, by the definition of storage 
V2 reading 1 implies V2 « U1, by the definition of storage 
V1 < V2, by the definition of issue order 


The first two orderings imply that V2 < V1, whereas the last implies that V1 <— V2. 


Both implications cannot be true. Thus, once a processor reads a new value from a 
location, it must never see an old value—time must not go backward. V2 must read 


2. 
5.6.2.2 Litmus Test 2 (Impossible Sequence) 
Pi Pj 
[U1] Pi:W(x,2) [V1] Pj:W(x,3) 
[V2] Pj:R(x,2) 
[V3] Pj:RGx,3) 


V2 reading 2 implies V1 < U1 
V3 reading 3 implies U1 = V1 
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Both implications cannot be true. Thus, once a processor reads a new value written 


by U1, any other writes that must precede the read must also precede U1. V3 must 
read 2. 3 


5.6.2.3 Litmus Test 3 (Impossible Sequence) 


Pi Pj Pk 
[U1] Pi:W(x,2) [V1] Pj:Wx,3) [W1] Pk:R(x,3) 


[U2] Pi:R(x,3) [W2] Pk:R(x,2) 
U2 reading 3 implies Ul < V1 
W2 reading 2 implies V1 < U1 


Both implications cannot be true. Again, time cannot go backward. If U2 reads 3 
then W2 must read 3. Alternately, if W2 reads 2, then U2 must read 2. 


5.6.2.4 Litmus Test 4 (Sequence Okay) 
= Pi 
[U1] Pi:W(x,2) [V1] Pj:R(y,2) 
[U2] Pi:W(y,2) [V2] Pj:R(x,]) 
There are no conflicts in this sequence. U2 — V1 and V2 < U1. U1 and U2 are not 


ordered with respect to each other. V1 and V2 are not ordered with respect to each 
other. There is no conflicting implication that U1 < V2. 


5.6.2.5 Litmus Test 5 (Sequence Okay) 


Pi Pj 
[U1] Pi:W(x,2) [V1] Pj:R(y,2) 
[V2] Pj:MB 


[U2] Pi:W(y,2) [V3] Pj:R(x,1) 


There are no conflicts in this sequence. U2 < V1 < V3 < U1. There is no conflicting 
implication that U1 < U2. 


5.6.2.6 Litmus Test 6 (Sequence Okay) 


Pi Pj 
[U1] Pi:W(x,2) [V1] Pj:R(y,2) 
[U2] Pi:MB 


[U3] Pi:Wy,2) [V2] Pj:R(x,1) 


_ There are no conflicts in this sequence. V2 < U1 «< U3 < V1. There is no conflicting 
implication that V1 = V2. 
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In scenarios 4, 5, and 6, writes to two different locations x and y are observed 
(by another processor) to occur in the opposite order than that in which they were 
performed. An update to y propagates quickly to Pj, but the update to x is delayed, 
and Pi and Pj do not both have MBs. 

5.6.2.7 Litmus Test 7 (Impossible Sequence) 
Pi Pj 
[U1} Pi:W(x,2) [V1] Pj:R(y,2) 
[U2] Pi:MB [V2] Pj:MB 
[U3] Pi:W(y,2) [V3] Pj:R(x,1) 


V1 reading 2 implies U3 < V1 
V3 reading 1 implies V3 — U1 
But, by transitivity, U1 = U3 = V1 <= V3 


Both cannot be true, so if V1 reads 2, then V3 must also read 2. 


5.6.2.8 Litmus Test 8 (impossible Sequence) 
Pi Pj 
[U1] Pi:W(x,2) [V1] Pj:WG,2) 
[U2] Pi:MB [V2] Pj:MB 
[U3] Pi:R{y,1) [V3] Pj:RG,1) 


U8 reading 1 implies U3 < V1 
V3 reading 1 implies V3 = U1 
But, by transitivity, U1 — U3 = V1 = V3 


Both cannot be true, so if U3 reads 1, then V3 must read 2, and vice versa. 


5.6.2.9 Litmus Test 9 (Impossible Sequence) 
Pi Pj 
[U1] Pi:W(x,2) [V1] Pj: W(x,3) 
[U2] Pi:RGx,2) [V2] Pj:RGx,3) 
[U3] Pi:R(,3) [V3] Pj:R(x,2) 


V3 reading 2 implies U1 < V3 
V2 <= V3 and V2 reading 3 implies V2 = U1 
V1 <= V2 and V2 <= U1 implies V1 = Ul 


U3 reading 3 implies V1 < U3 
U2 <= U3 and U2 reading 2 implies U2 = V1 
U1 = U2 and U2 < V1 implies U1 = V1 
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Both V1 = U1 and U1 < V1 cannot be true. Time cannot go backwards. If V3 reads 
2, then U3 must read 2. Alternatively, If U3 reads 3, then V3 must read 3. 
5.6.3 Implied Barriers 


In Alpha, there are no implied barriers. If an implied barrier is needed for 
functionally correct access to shared data, it must be written as an explicit 
instruction. (Software must explicitly include any needed MB or IMB instructions.) 


Alpha transitions such as the following have no built-in implied memory barriers: 
e Entry to PALcode 

e Sending and receiving interrupts 

¢ Returning from exceptions, interrupts, or machine checks 

e Swapping context | 

¢ Invalidating the Translation Buffer (TB) 


Depending on implementation choices for maintaining cache coherency, some PAL 
/cache implementations may have an implied IMB in the I-stream TB fill routine, 
but this is transparent to the non-PAL programmer. 


5.6.4 implications for Software 


Software must explicitly include MB or IMB instructions in the following 
circumstances. 


5.6.4.1 Single-Processor Data Stream 


No barriers are ever needed. A read to physical address x will always return 
the value written by the immediately preceding write to x in the processor issue 
sequence. 


5.6.4.2 Single-Processor Instruction Stream 


An I-fetch from virtual or physical address x does not necessarily return the value 
written by the immediately preceding write to x in the issue sequence. To make 
the I-fetch reliably get the newly written instruction, an IMB is needed between the 
write and the I-fetch. 


5.6.4.3 Multiple-Processor Data Stream (Including Single Processor with DMA I/O) 


_ The only way to communicate shared data reliably is to write the shared data on one 
processor, then do an MB on that processor, then write a flag (equivalently, send an 
interrupt) signaling the other processor that the shared data is ready. Each receiving 
processor must read the new flag (equivalently, receive the interrupt), then do an 
MB, then read or update the shared data. 


Leaving out the first MB removes the assurance that the shared data is written 
before the flag is. 
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5.6.4.4 


Leaving out the second MB removes the assurance that the shared data is read or 
updated only after the flag is seen to change; in this case, an — read could see 
an old value, and an early update could be overwritten. 


This implies that after a CPU has prepared some data buffer to be read from memory 
by a DMA I/O device (such as writing a buffer to disk), it must do an MB before 
starting the I/O, and the I/O device after receiving the start signal must logically do 
an MB before reading the data buffer. 


This also implies that after a DMA I/O device has written some data to memory 
(such as paging in a page from disk), the DMA device must logically do an MB 
before posting a completion interrupt, and the interrupt handler software must do 
an MB before the data is guaranteed to be visible to the interrupted processor. Other 
processors must also do MBs before they are guaranteed to see the new data. 


An important special case occurs when a write is done (perhaps by an I/O device) to 
some physical page frame, then an MB, then a previously invalid PTE is changed 
to be a valid mapping of the physical page frame that was just written. In this 
case, all processors that access using the newly valid PTE must guarantee to deliver 
the newly written data after the TB miss, for both I-stream and D-stream accesses. 
\ This can perhaps be done in TB-miss PALcode.\ 


Multiple-Processor instruction Stream (Including Single Processor with DMA I/O) 


The only way to update the I-stream reliably is to write the shared I-stream on one 
processor, then do an IMB (MB if the writing processor is not going to execute the 
new I-stream) on that processor, then write a flag (equivalently, send an interrupt) 
signaling the other processor that the shared I-stream is ready. Each receiving 
processor must read the new flag (equivalently, receive the interrupt), then do an 
IMB, then fetch the shared I-stream. 


Leaving out the first IMB(MB) removes the assurance that the shared I-stream is 
written before the flag is. 


Leaving out the second IMB removes the assurance that the shared I-stream is read 
only after the flag is seen to change; in this case, an early read could see an old 
value. 


This implies that after a DMA I/O device has written some I-stream to memory (such 
as paging in a page from disk), the DMA device must logically do an IMB(MB) before 
posting a completion interrupt, and the interrupt handler software must do an IMB 
before the I-stream is guaranteed to be visible to the interrupted processor. Other 
processors must also do IMBs before they are guaranteed to see the new I-stream. — 


An important special case occurs when a write is done (perhaps by an I/O device) 
to some physical page frame, then an IMB(MB), then a previously invalid PTE is 
changed to be a valid mapping of the physical page frame that was just written. In 
this case, all processors that access using the newly valid PTE must guarantee to 
deliver the newly written I-stream after the TB miss. 
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§.6.4.5 Multiple-Processor Context Switch 


If a process migrates from executing on one processor to executing on another, the 
context switch operating system code must include a number of barriers. 


A process migrates by having its context stored into memory, then eventually having 
that context reloaded on another processor. In between, some shared mechanism 
must be used to communicate that the context saved in memory by the first processor 
is available to the second processor. This could be done by using an interrupt, by 
using a flag bit associated with the saved context, or by using a shared-memory 
multiprocessor data structure, as follows: 


First Processor Second Processor 


Save state of current process. 


MB [1] 
Pass ownership of process context = Pick up ownership of process context 
data structure memory. data structure memory. 


MB [2] 

Restore state of new process context data 
structure memory. 

Make I-stream coherent [3]. 

Make TB coherent [4]. 


Execute code for new process that 
accesses memory that is not common to 
all processes. 


MB [1] ensures that the writes done to save the state of the current process happen 
before the ownership is passed. 


MB [2] ensures that the reads done to load the state of the new process happen 
after the ownership is picked up and hence are reliably the values written by the 
processor saving the old state. Leaving this MB out makes the code fail if an old 
value of the context remains in the second processor’s cache and invalidates from 
the writes done on the first processor are not delivered soon enough. 


The TB on the second processor must be made coherent with any write to the page 
tables that may have occurred on the first processor just before the save of the process 
state. This must be done with a series of TB invalidate instructions to remove any 
nonglobal page mapping for this process, or by assigning an ASN that is unused on 
the second processor to the process. One of these actions must occur sometime before 
starting execution of the code for the new process that accesses memory (instruction 
or data) that is not common to all processes. A common method is to assign a new 
ASN after gaining ownership of the new process and before loading its context, which 
includes its ASN. 
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The D-cache on the second processor must be made coherent with any write to the D- 
stream that may have occurred on the first processor just before the save of process 
state. This is ensured by MB [2] and does not require any additional instructions. 


The I-cache on the second processor must be made coherent with any write to the 
I-stream that may have occurred on the first processor just before the save of process 
state. This can be done with an IMB PAL call sometime before the execution of any 
code that is not common to all processes, More commonly, this can be done by forcing 
a TB miss (via the new ASN or via TB invalidate instructions) and using the TB- 
fill rule (see Section 5.6.4.3). This latter approach does not require any additional 
instruction. 


Combining all these considerations gives: 


First Processor Second Processor 


Pick up ownership of process 
context data structure memory. 
MB 

Assign new ASN or invalidate TBs. 
Save state of current process. 
Restore state of new process. 


MB 
Pass ownership of process context = Pickup ownership of new process context 
data structure memory. data structure memory. 


MB 

Assign new ASN or invalidate TBs. 
Save state of current process. 

Restore state of new process. 

MB | 

Pass ownership of old process context 
data structure memory. 


Execute code for new process that 
accesses memory that is not common to 
all processes. 


Note that on a single processor there is no need for the barriers. 


5.6.4.6 Multiple-Processor Send/Receive Interrupt 


If one processor writes some shared data, then sends an interrupt to a second 
processor, and that processor receives the interrupt, then accesses the shared data, 
the sequence from Section 5.6.4.3 must be used: | 
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First Processor : Second Processor 


Write data 

MB oe 

Send int. => Receive int. 
MB 
Access data 


Leaving out the MB at the beginning of the interrupt-receipt routine makes the 
code fail if an old value of the context remains in the second processor’s cache and 
invalidates from the writes done on the first processor are not delivered soon enough. © 


9.6.5 Implications for Hardware 


The coherency point for physical address x is the place i in the memory sub eyatein at 
which accesses to x are ordered. It may be at a main memory board, or at a cache 
containing x exclusively, or at the point of winning a common bus arbitration. 


The coherency point for x may move with time, as exclusive access to x migrates 
between main memory and various caches. 


MB and IMB force all preceding writes to at least reach their respective coherency 
points. This does not mean that main-memory writes have been done, just that the 
order of the eventual writes is committed. For example, on the XMI with retry, this 
means getting the writes acknowledged as received with good parity at the inputs 
to memory board queues; the actual RAM write happens later. 


MB and IMB also force all queued cache invalidates to be delivered to the local 
caches before starting any subsequent reads (that may otherwise cache hit on stale 
data) or writes (that may otherwise write the cache, only to have the write effectively 
overwritten by a late-delivered invalidate). 


Implementations may allow reads of x to hit (by physical address) on pending writes 
in a write buffer, even before the writes to x reach the coherency point for x. If this 
is done, it is still true that no earlier value of x may subsequently be delivered to 
the processor that took the hit on the write buffer value. 


Virtual data caches are allowed to deliver data before doing address translation, but _ 
only if there cannot be a pending write under a synonym virtual address. Lack of a 
_write-buffer match on untranslated address bits is sufficient to guarantee this. 


Virtual data caches must invalidate or otherwise become coherent with the new value 
whenever a PALcode routine is executed that affects the validity, fault: behavior, 
protection behavior, or virtual-to-physical mapping specified for one or more pages. 
Becoming coherent can be delayed until the next subsequent MB instruction or TB 
fill (using the new mapping), if the en of the PALcode routine always 
forces a subsequent TB fill. — 
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5.7 Arithmetic Traps 


Alpha implementations are allowed to execute multiple instructions concurrently 
and to forward results from one instruction to another. Thus, when an arithmetic 
trap is detected, the PC may have advanced an arbitrarily large number of 
instructions past the instruction T (calculating result R) whose execution triggered 
the trap. 3 


When the trap is detected, any or all of these subsequent instructions may run to 
completion before the trap is actually taken. Instruction T and the set of instructions 
subsequent to T that complete before the trap is taken are collectively called the trap 
shadow of T. The PC pushed on the stack when the trap is taken is the PC of the 
first instruction past the trap shadow. 


The instructions in the trap shadow of T may use the undefined result R of T, they 
may generate additional traps, and they may completely change the PC (branches, 
JSR). 


Thus, by the time a trap is taken, the PC pushed on the stack may bear no useful 
relationship to the PC of the trigger instruction T, and the state visible to the 
programmer may have been updated using the undefined result R. If an instruction 
in the trap shadow of T uses R to calculate a subsequent register value, that register 
value is undefined, even though there may be no on associated with the subsequent 
calculation. Similarly: 


e If an instruction in the trap shadow of T stores R or any subsequent undefined 
result, the stored value is undefined. 


¢ If an instruction in the trap shadow of T uses R or any subsequent undefined 
result as the basis of a conditional or calculated branch, the branch target is 
undefined. 


e If an instruction in the trap shadow of T uses R or any subsequent undefined 
result as the basis of an address calculation, the memory address actually 
accessed is undefined. 


Software that is intended to bound how far the PC may advance before taking a trap, 
or how far an undefined result may propagate, must insert TRAPB instructions at 
appropriate points. 


Software that is intended to continue from a trap by supplying a well-defined result 
R within an arithmetic trap handler, can do so reliably by following the rules for 
software completion code sequences given in Section 4.7.5. 
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5.8 \REVISION HISTORY 
| Revision 5.0, May 12, 1992 


1. 


2 
3. 
4 


Changed DRAINT to TRAPB 
Converted to SDML 
Generalized OS specific PALcode instructions 


Generalized OS specific multiprocessor context switching 


Revision 4.0, March 29, 1991 


1. 
2. 
3. 


Added Litmus Test 9 
Explain what an excess data transfer is 


Correct typing error in code sequence example for modification of atomic data 
structure 


Add MB instructions to second illustrative example that specifies use of MB for | 
multiple processor context switch 


Note that MB and IMB do not guarantee timeliness 
Removed reference to byte when specifying granularity of data transfer widths 


Made minor changes to correct use of capitals and remove repeated words in the 
Litmus Test section | 


Revision 3.0, Mar 2, 1990 


1. 
2. 


Complete rewrite of data sharing 


Complete rewrite of read/write ordering 


Revision 2.0, October 4, 1989 


iF 


2. 
3. 
4. 
D. 


Total rewrite | 

Memory, buffer, I/O spaces removed; Physical memory regions added 
SWP, FREEZE, and THAW removed; LDQ/L and STQ/C added 

FAS removed; MB and NUDGE added 

DRAIN and WAIT removed; DRAINT and /Semi-precise added 


Revision 1.0, May 23, 1989 


1. 


First Review Distribution 
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Chapter 6 
Common PALcode Architecture (I) 


~ 6.1 PALcode 


In a family of machines, both users and operating system implementors require 
functions to be implemented consistently. When functions conform to a common 
interface, the code that uses those functions can be used on several different 
implementations without modification. 


These functions range from the binary encoding of the instruction and data to the 
exception mechanisms and synchronization primitives. Some of these functions can 
be implemented cost effectively in hardware, but others are impractical to implement 
directly in hardware. These functions include low-level hardware support functions 
such as Translation Buffer miss fill routines, interrupt acknowledge, and vector 
dispatch. They also include support for privileged and atomic operations that require 
long instruction sequences. 


In the VAX, these functions are generally provided by microcode. This is not seen as 
a problem because the VAX architecture lends itself to a microcoded implementation. 


One of the goals of Alpha is that microcode will not be necessary for practical 
implementation. However, it is still desirable to provide an architected interface 
to these functions that will be consistent across the entire family of machines. The 
Privileged Architecture Library (PALcode) provides a mechanism to implement these 
functions without resorting to a microcoded machine. 


NOTE 
\The hardware development groups provide and main- 
tain the standard PALcode for a given implementation. 
The PALcode may be in ROM or loaded into RAM from 
some sort of a console load device. Many of the same 
trade-offs exist for PALcode that exist for microcode 
around patching, loading, and booting. Also, operating © 
systems are free to provide their own PALcode rather 
than use the version provided by the hardware group. \ 


6.2 PALcode Instructions and Functions 
PALcode is used to implement the following functions: 
¢ Instructions that require complex sequencing as an atomic operation 


e¢ Instructions that require VAX-style interlocked memory access 


e Privileged instructions 
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¢ Memory management control (including translation buffer (TB) management) 
° Context swapping | | | 
° Interrupt and exception dispatching 
© Power-up initialization and booting 
e Console functions | 
¢ Emulation of instructions with no hardware support. 


The Alpha architecture lets these functions be implemented in standard machine 
code that is resident in main memory. PALcode is written in standard machine 
code with some implementation-specific extensions to provide access to low-level 
hardware. This lets an Alpha implementation make various design trade-offs based 
on the hardware technology being used to implement the machine. The PALcode 
can abstract these differences and make them invisible to system software. 


For example, in a MOS VLSI implementation, a small (32 entry) fully associative 
TB can be the right match to the media, given that chip area is a costly resource. 
In an ECL version, a large (1024 entry) direct-mapped TB can be used because it 
will use RAM chips and does not have fast associative memories available. This 
difference would be handled by implementation-specific versions of the PALcode on 
the two systems, both versions providing transparent TB miss service routines. The 
operating system code would not need to know there were any differences. 


Part II, Operating Systems describes the Digital-supplied Alpha Privileged 
Architecture Library (PALcode) routines and environment. Other systems may use 

_ the Digital-supplied PALcode library or architect and implement a different library of 
routines. Alpha systems are required to support the replacement of Digital-defined 
PALcode with an operating system-specific version. 


NOTE 
\ The register conventions used are based on the Alpha 
calling standard Version 1.0. The PALcode library will 
track the Alpha calling standard changes as long as that 
is practical. \ 


6.3 PALcode Environment 


The PALcode environment differs from the normal environment in the following 
ways: | : | | 


¢ Complete control of the machine state. 
e Interrupts are disabled. | 
¢ Implementation-specific hardware functions are enabled, as described below. 


© ‘JI-stream memory management traps are prevented (by disabling I-stream 


mapping, mapping PALcode with a permanent TB entry, or by other 
mechanisms). ; | 
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Complete control of the machine state allows all functions of the machine to be 
controlled. Disabling interrupts allows the system to provide multi-instruction 
sequences as atomic operations. Enabling implementation-specific hardware 
functions allows access to low-level system hardware. Preventing I-stream memory 
management traps allows PALcode to implement memory management functions 
such as translation buffer fill: | 


6.4 Special Functions Required for PALcode 


PALcode uses the Alpha instruction set for most of its operations. A small number 
of additional functions are needed to implement the PALcode. There are five 
opcodes reserved to implement PALcode functions: PALRESO, PALRES1, PALRES2, - 
PALRES3 and PALRES4. These instructions produce an Illegal Instruction Trap if 
executed outside the PALcode environment. 


e PALcode needs a mechanism to save the current state of the machine and 
dispatch into PALcode. 


e PALcode needs a set of instructions to access hardware control registers. 


e PALcode needs a hardware mechanism to transition the machine from the 
PALcode environment to the non-PALcode environment. This mechanism loads 
the PC, enables interrupts, enables mapping, and disables PALcode privileges. 


An Alpha implementation may also choose to provide additional functions to simplify 
or improve performance of some PALcode functions. The following are some 
examples: 


e An Alpha implementation may include a read/write virtual function that allows 
PALcode to perform mapped memory accesses using the mapping hardware 
rather than providing the virtual-to-physical translation in PALcode routines. 
PALcode may provide a special function to do physical reads and writes and 
have the Alpha loads and stores continue to operate on virtual address in the 
PALcode environment. 


e An Alpha implementation may include hardware assists for various functions— 
for example, saving the virtual address of a reference on a memory management 
error rather than having to generate it by simulating the effective address 
calculation in PALcode. 


e An Alpha implementation may include private registers so it can function without 
having to save and restore the native general registers. 


6.5 PALcode Effects on System Code 


PALcode will have one effect on system code. Because PALcode may be resident 
in main memory and maintain privileged data structures in main memory, the 
operating system code that allocates physical memory cannot use all of physical 
memory. 


The amount of memory PALcode requires is small, so the loss to the system is 
negligible. 
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6.6 PALcode Replacement 


Alpha systems are required to support the replacement of Digital-supplied PALcode 


with an operating system-specific version. The following functions must be 
~ implemented in PALcode, not directly in hardware, to facilitate replacement with 
different versions. 


1. Translation Buffer fill. Different operating systems will want to replace the 
Translation Buffer (TB) fill routines. The replacement routines will use different 
data structures. The page tables documented in Part II, Operating Systems will 
not be present in these systems. Therefore, no portion of the TB fill flow that 
would change with a change in page tables may be placed in hardware, unless 
it is placed in a manner that can be overridden by PALcode. 


2. Process structure. Different operating systems might want to replace the process 
context switch routines. The replacement routines will use different data 
structures. The HWPCB or PCB documented in Part I, Operating Systems will 
not be present in these systems. Therefore, no portion of the context switching 
flows that would change with a change in process structure may be placed in. 
hardware. 


PALcode must be written in a modular manner that facilitates easy replacement of 
major subsections. The subsections that need to be simple to replace are: 


° Translation Buffer fill | 
¢ Process structure and context switch 
e Interrupt and exception frame format and routine dispatch 


e Privileged PALcode instructions 


6.7 Required PALcode Instructions 


The PALcode instructions listed in Table 6-1 and Appendix C must be recognized by 
mnemonic and opcode in all operating system implementations, but the effect of each 
instruction is dependent on the implementation. The operation of these PALcode 
instructions for Digital-supplied operating system implementations is described in 
Part IT, Operating Systems. 


Table 6-1: PALcode Instructions that Require Recognition 
Mnemonic Name 

BPT Breakpoint trap 

BUGCHK Bugcheck trap 

GENTRAP Generate trap 

RDUNIQUE = Read unique value 

WRUNIQUE Write unique value 


6—4 Common Architecture (I) 





The PALcode instructions listed in Table 6-2 and described in the following sections 
must be supported by all Alpha implementations: 


Table 6-2: Required PALcode Instructions 


Mnemonic Type Operation 

DRAINA Privileged Drain aborts 

HALT Privileged — Halt processor 

IMB Unprivileged I-stream memory barrier 
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6.7.1 Drain Aborts 
Format: 
CALL PAL DRAINA : !PALcode format 


Operation: 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 
{Stall instruction issuing until all prior 


instructions are guaranteed to complete 
without incurring aborts. } 


Exceptions: 


Privileged Instruction 


instruction Mnemonics: 


CALL_PAL DRAINA Drain Aborts 


Description: 


If aborts are deliberately generated and handled (such as non-existent-memory 
aborts while sizing memory or searching for I/O devices), the DRAINA instruction 
forces any outstanding aborts to be taken before continuing. 


Aborts are necessarily implementation-dependent. DRAINA stalls instruction issue 
at least until all previously-issued instructions have completed and any associated 
aborts have been signaled. For operate instructions, this will usually mean stalling 
until the result register has been written. For branch instructions, this will 
usually mean stalling until the result register and PC have been written. For 
load instructions, this will usually mean stalling until the result register has been 
written. For store instructions, this will usually mean stalling until at least the first 
level in a potentially multi-level memory hierarchy has been written. 


For load instructions, DRAINA does not necessarily guarantee that the unaccessed 
portions of a cache block have been transferred error-free before continuing. 


For store instructions, DRAINA does not necessarily guarantee that the ultimate 
target location of the store has received error-free data before continuing. 
An implementation-specific technique must be used to guarantee the ultimate 
completion of a write in implementations that have multi-level memory peace 
or store-and-forward bus adapters. 
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6.7.2 Halt 
Format: 


CALL_PAL HALT !PALcode format 


Operation: 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 


CASE {halt action} OF 


halt: {halt } 
restart/halt: {restart/halt} 
restart/boot/halt: {restart/boot/halt} 
boot/halt: {boot /halt } 

ENDCASE 


Exceptions: 


Privileged Instruction 


Instruction mnemonics: 


CALL_PAL HALT Halt Processor 


Description: 


The HALT instruction stops normal instruction processing, and depending on the 
HALT action setting, the processor may either enter console mode or the restart 
sequence. See Platform Section, Chapter 4. 


NOTE 
\ The halt actions will be changed to match the boot and 
console chapters when they are done. \ 
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6.7.3 Instruction Memory Barrier 
Format: : 
CALL_PAL IMB a | !PALcode format 
_ Operation: 


{Make instruction stream coherent with Data stream} 


Exceptions: 


~ None 


Instruction mnemonics: 


CALL_PAL IMB I-stream Memory Barrier 


Description: 


An IMB instruction must be executed after software or I/O devices write into the 
instruction stream or modify the instruction stream virtual address mapping, and 
before the new value is fetched as an instruction. An implementation may contain 
an instruction cache that does not track either processor or I/O writes into the 
instruction stream. The instruction cache and memory are made coherent by an 
IMB instruction. | 


If the instruction stream is modified and an IMB is not executed before fetching an 
instruction from the modified location, it is UNPREDICTABLE whether the old or 
new value is fetched. 


The cache coherency and sharing rules are described in Chapter 5. 
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6.8 Revision History 
Revision 5.0 May 12, 1992 
1. Added list of recognition-required PALcode instructions 
Added DRAINA to list of required PALcode instructions 
Changed privileges enabled to complete control of the machine state 
PALcode override for TB fill routines 
Added HALT and IMB PALcode instructions 


oe es oN 


Revision 4.1 May 12, 1992 
1. Created the chapter from Sections 1.1 through 1.6 of the V4.n SRM 
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Chapter 7 
Console Subsystem Overview (I) 


On an Alpha system, underlying control of the system platform hardware is provided 
by a console. The console: 


i. 


5. 


Initializes, tests, and prepares the system platform hardware for Alpha system 
software. 


. Bootstraps (loads into memory and starts the execution of) system software. 


Controls and monitors the state and state transitions of each processor in a 
multiprocessor system. 


Provides services to system software that simplify system software control of and 
access to platform hardware. 


Provides a means for a console operator to monitor and control the system. 


The console interacts with system platform hardware to accomplish the first three 
tasks. The actual mechanisms of these interactions are specific to the platform 
hardware; however, the net effects are common to all systems. | 


The console interacts with system software once control of the system platform 
hardware has been transferred to that software. 


The console interacts with the console operator through a virtual display device or 
console terminal. The console operator may be a human being or a management 
application. | 
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Chapter 8 
~ Input/Output (1) 


8.1 Introduction 


Conceptually, Alpha systems consist of processors, memory, processor-memory 
interconnect (PMI), I/O buses, bridges, and I/O devices. 


Figure 8—1 shows the Alpha system overview. 


Figure 8-1: Alpha System Overview 


Processor-Memory Interconnect 


Local 
VO Device Processor Memory 


Bridge 





I/O Bus 


Remote Remote 
VO Device I/O Device 
As shown in Figure 8-1, processors and memory are connected by the PMI. 


A bridge connects a tightly coupled I/O bus to the system, either directly to the PMI 
or through another tightly coupled I/O bus. A tightly coupled I/O bus is one whose 
address space is accessible to the processor either directly or through an I/O mailbox. 


A bridge has at least a local side and a remote side, connected by a hose. The local 
side is electrically closer to the PMI; the remote side is electrically further. 


TV/O devices can be connected to the PMI or to an I/O bus. A local device connects to 
the PMI; a remote device connects to an I/O bus. 


The following sections discuss Alpha I/O operations: 
e Accesses to local I/O space are discussed in Section 8.2. 


e Accesses to remote I/O space are discussed in Section 8.3. 
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e Reads and writes to processor memory-like regions initiated by I/O devices, or 
“DMAs”, are discussed in Section 8.4. 


e Processor interrupts requested by devices are discussed in Section 8.5. 
e Bus-specific I/O accesses are discussed in Section 8.6. 
¢ \ Some implementation-specific considerations are discussed in Section 8.7. 


e Targettable interrupts are discussed in Section 8.8. \ 


8.2 Local I/O Space Access 


Local I/O space locations may appear in either memory or non-memory-like regions. 
Local I/O space locations which appear in memory regions may be cached subject to 
the platform cache coherency scheme. See Chapter 5. 


An Alpha platform need only support atomic quadword accesses. The 
Alpha instruction architecture requires only quadword accesses. Processor 
implementations may further restrict the access granularity of local I/O space. For 
example, a given implementation could permit addressing of only cache blocks. To 
support byte or word accesses to a local device, the device must be mapped into 
a non-memory-like region with a sparse address space. The necessary mapping is 
dependent on the implementation of the processor, cache, and PMI protocol. For 
example, the four individual bytes of a longword device control register could be 
mapped into the low order byte of each of four contiguous quadwords. 


8.2.1 Read/Write Ordering 


Access to local I/O space does not cause any implicit read/write ordering; explicit. 
barrier instructions must be used to ensure any desired ordering. Barrier 
instructions must be used: 


e After updating a memory-resident data structure and before writing a local I/O 
space location to notify the device of the updates. 


¢ Between multiple consecutive direct accesses to local I/O space, e.g. device control 
registers, if those accesses are expected to be ordered at the device. 


Again, note that implementations may cache not only memory-resident data 
structures, but also local I/O space locations. 


8.3 Remote I/O Space Access 


Remote I/O space locations are accessed indirectly through a memory-resident 
“mailbox” data structure. To post an access, the physical address of the mailbox is 
written into a MailBox Pointer Register (MBPR) on a local bridge side. For remote 
I/O space writes, the command and data are posted in the mailbox, and status is 
returned. For remote I/O space reads, the command is posted in the mailbox, and 
status and data are returned. 


An Alpha system may have any number of local bridge sides. Each local sid 


provide connections for up to 256 hoses. Each hose may connect to a single remote 
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side or may connect to multiple remote sides. A single remote side may connect to 
one or more hoses. A bridge need not include a hose; the local and remote sides 
may be implemented as a single entity. A-local side or an entire bridge may be 
incorporated into a processor board. 


8.3.1 Mailbox Posting 


A remote I/O space access is defined by the contents of the mailbox structure. A 
remote I/O space access is invoked by writing the base physical address of the 
mailbox structure into the appropriate bridge MailBox Pointer Register (MBPR). 
Each I/O bus may be associated with one and only one MBPR. A single MBPR may 
be associated with one or more remote I/O buses and a single bridge may have 
multiple MBPR registers. The MBPR appears in local I/O space. 


The MBPR is accessed only with the STQ_C instruction. Flow control is achieved 
by the associated (per-processor) lock_flag as follows: 


post _mbx: 


<derive PA of mailbox and load RI1> 
<derive VA of MBPR and load RQ> 
STQ C R1,R0 

BEQ Rl,wait _post_mbx 


wait_post_mbx: 
<backof£ delay> 
BR post_mbx 


If the STQ_C lock_flag is set, the mailbox has been posted to the bridge. If the 
STQ_C lock_flag is clear, all MBPR resources are occupied; the MBPR write must be 
retried. In multi-processor configurations, this use of the STQ_C instruction affects 
only the local per-processor lock_flag. The state of the per-processor lock_flag of 
other processors is unchanged. 


HARDWARE/SOFTWARE IMPLEMENTATION NOTE 
The use above of the STQ_C instruction is specific to the 
first Alpha implementations. \(EV-3 and EV-4)\ Future 
implementations may use a different access mechanism. 
\See Section 8.7.2.\ 


A given remote I/O space location is uniformly accessible to all processors in a multi- 
processor configuration. A given hose, hence a given remote I/O bus, may be accessed 
via an MBPR at the same physical address from any processor. A software thread 
need have no knowledge of the specific processor on which it is executing. 


A FIFO structure may be implemented behind each MBPR register to permit the 
_ posting of multiple outstanding mailbox operations. A set of processor-specific 
request queues may be implemented behind each MBPR register to ensure fair access 
to all processors. Any such FIFO or queue is invisible to software. | 


Input/Output (1) 8-3 


istribution 





Bridge implementations must protect against lockout and ensure fair MBPR access © 

to all processors in a multi-processor configuration. Multiple writes to an MBPR by 

a single processor must not be able to cause the starvation or timeout of competing 
_ writes to the same MBPR by other processors. 


Multiple software threads executing at different IPLs on a single processor may 
cause starvation or timeout of the lower IPL threads. IPL levels are inherently 
unfair. \See Section 8.7.3.\ 


Bridge implementations must guarantee ead progress on mares eperatiens 
regardless of direct memory access or interrupt load. 


8.3.2 Mailbox Pointer Register (MBPR) 
The MBPR format is shown in Figure 8—2 and described in Table 8-1. 


‘Figure 8-2: Mailbox Pointer Register Format 


63 48 47 6 5 0 


Table 8-1: Mailbox Pointer Register Format 


Bit(s) Description 

<5:0> SBZ 

<47:6> Physical address of the mailbox structure. The mailbox structure must be at 
least 64-byte aligned. 


<63:48> SBZ 
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8.3.3 Mailbox Structure 


The mailbox is a 64-byte, naturally aligned, data structure. The format is.shown in 
Figure 8-3 and described in Table 8-2. 


Figure 8-3: Mailbox Data Structure Format 


63 56 55 48 49 40 39 32 31 30 29 . 21 0 







RBADR © 
WDATA 






UNPREDICTABLE 


UNPREDICTABLE RDATA | 


UNPREDICTABLE 


A+64 
Table 8-2: Mailbox Data Structure Format 
Offset Bit(s) Name Description | 
0 <29:0> CMD Remote bus command. Controls the actual remote bus 


operation and can include fields such as address only, 
address width, and data width. See Section 8.6.2. 


<30> B Remote bridge access. If set, the command is a special 
or diagnostic command directed to the remote side. See 
Section 8.6.3. 

<31l> WwW Write access. If set, the remote bus operation is a write. 


<39:32> MASK s Disable Byte Mask. Disables bytes within the remote bus 
address. Mask bit <i> set causes the byte to be disabled; 
e.g. data byte <i> will NOT be written to the remote 
address. See Section 8.6.2. 


<47:40> SBZ 


<55:48> HOSE _ Hose. Specifies the remote bus to be accessed. Bridges may 
directly connect to up to 256 remote buses per hose. 


<63:56> SBZ 


8 <63:0> RBADR Remote Bus Address. Contains the target address of the 
device on the remote bus. See Section 8.6.2. 
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Table 8-2 (Cont.): Mailbox Data Structure Format 


Offset Bit(s) Name Description 
16 <63:0> WDATA Write Data. For write commands, contains the data to be 
written. For read commands, the field is not used by the 
bridge. 
24 <63:0> | | UNPREDICTABLE. 
32 <31:0> RDATA Read Data. For read commands, contains the data 
| returned. For write data commands, the field is 
UNPREDICTABLE. 
<63:32> UNPREDICTABLE. 


40 <0> DON Done. Indicates that the ERR, STATUS, and RDATA fields 
| are valid; that the mailbox structure may be safely modified 
by host software. 


<1> ERR Error. If set, indicates that an error was encountered 
and that the STATUS field contains additional information. 
Valid only when DON is set. See Sections 8.3.7 and 8.3.8. 


<63:2> STATUS Operation completion status. Contains information specific 
| to the bridge implementation. Valid only when DON is set. 


The bridge specification must include a definition of this 
field. See Sections 8.3.7 and 8.3.8. 


48 <63:0> UNPREDICTABLE. 
56 <63:0> UNPREDICTABLE. 


8.3.4 Mailbox Access Synchronization 


The ownership of the mailbox structure is exchanged between the posting software 
and the servicing bridge. The first 3 quadwords must be initialized by the software 
prior to posting the mailbox to the bridge. Once posted, the contents of the mailbox 
are owned by the bridge and are UNPREDICTABLE until the DON bit is set by 
the bridge. If the mailbox contents are altered by software prior to the DON 
bit becoming set, the action of the bridge and the resulting mailbox contents are 
UNPREDICTABLE. Once the DON bit has been set by the bridge, the mailbox 
contents are again owned by the software and must not be altered by the bridge. 
\See Section 8.7.4.\ 


Software use of the DON bit for synchronization is encouraged. If the DON bit is set 
in the mailbox at the time that the mailbox is posted, it is not possible to determine 
when the mailbox structure may be safely altered nor is it possible to determine 
when any returned information (RDATA or STATUS or ERR) becomes valid. Use of 
a static, not dynamically altered, mailbox structure is recommended only for true 
write-and-run of static data such as setting a “go” bit in a device control register. 


Note that the DON bit set does NOT guarantee that a remote I/O space write has 
actu ally eamnietad at the davira The NON hit maw ha cat hv anv intervening hridaa 


Ee SA ET wR VS CAY VRAD WK TF OW! Be hE AZ UIT AY AV LY teteaheted® ws WY Wy AsLY hd VR V Wass dD fey As SNAG ws 


See Section 8.3.8. 
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The servicing bridge ignores the contents of the DON, ERR, and STATUS fields; 
these fields are treated as write only. _ 


8.3.5 Mailbox Read/Write Ordering 


Mailbox accesses to a given remote bus are dedersd by the MBPR and bus bridge. 
After posting in the MBPR, the ordering must be retained by the bridge. The bridge 
may reorder operations only across different hoses. Mailboxes targeted to different 
buses connected to the same local bridge side may occur in a sequence different from 
the posting order. 


Mailbox operations are implicitly ordered when one and only one MBPR is used to 
access a given remote I/O bus. In general, there is only one path to a given remote 
I/O bus via a unique hose and remote side. In such configurations, the hardware 
must retain the ordering of mailbox accesses. In configurations in which there are 
multiple paths, software should order mailbox operations by using one and only one 
MBPR to access a given remote bus. | 


8.3.6 Remote I/O Space Access Granularity 
The granularity of remote I/O space accesses is not symmetric: 
© Mailbox reads are defined to bytes, words, and longwords. 
¢ Mailbox writes are defined to bytes, words, longwords and quadwords. 


Mailbox writes were optimized to permit efficient and atomic writes of a full 48-bit 
Alpha physical address. 


Not all bus bridges will support all possible remote I/O space access granularities. 
The supported granularity will be determined by the capabilities of the remote bus 
and the remote bus side. 


The MASK and RBADR fields are determined by the addressing and masking modes 
of the remote I/O bus. Invalid MASK fields, or invalid combinations of MASK and 
RBADR fields, will not cause ERR to be set. Error checking (if any) is done on 
the remote (I/O bus) side of the bridge; the local (PMI) side of the bridge employs 
disconnected writes. If error checking is done by the remote side of the bridge, the 
error is reported by an error interrupt. 


On mailbox write accesses, bridges (and chains of bridges) deliver the valid WDATA, 
RBADR, and MASK information to the remote I/O device. The valid data may be 
encapsulated, along with invalid data, into larger data packets; the invalid data may 
simply be invalid fields from the WDATA quadword. For some remote I/O buses, the 
RBADR and MASK fields may be truncated or otherwise mapped. 


On mailbox read accesses, bridges (and chains of bridges) deliver the valid RBADR, 
MASK, and command information to the remote I/O device. - The bridge has no 
knowledge of the intended size of the read data - this is known only to the requesting | 
software and the device, which are assumed to agree. The valid data may be 
encapsulated, along with invalid data, into larger data packets. Again, for some 
remote I/O buses, the RBADR and MASK fields may be truncated or otherwise 
mapped. 
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8.3.7 Remote I/O Space Read Accesses 


The bridge must return status and data for remote I/O space reads. When the 
mailbox DON bit is set by the bridge, the operation has completed, and the ERR 
and STATUS fields may be examined. If the ERR bit is not set, the requested 
remote bus operation was successful and valid data was returned. If the ERR bit is 
set, an error was encountered and the STATUS field contains information as to the 
nature of the error. 


Errors encountered on remote I/O space read accesses may also be reported by bridge 
error interrupts. The bridge side which encounters the error requests the interrupt. 
Thus, a non-existent hose error may be reported by the local (PMI) side of the bridge, 
while a non-existent remote bus address error is reported by the remote (I/O bus) 
side of the bridge. 


Remote I/O space read accesses may be performed as follows: 


remote read: 


<load Rm with VA of mailbox> 
<ensure mailbox no longer in use by bridge> 
<derive and load mailbox CMD, MASK, HOSE, and RBADR fields> 


STO R31, 40 (Rm) + Clear DON/ERR/STATUS fields 
MB 


post_mbx: 
<derive PA of mailbox and load RI1> 
<derive VA of MBPR and load RO> 
STQ C R1,R0 
BEQ Rl,wait post _mbx 


wait _mbxdone: , 
LDQ RO, 40(Rm) ; Fetch STATUS/DON 


BLBS RO, check _err ; Branch on DON set 
<backoff delay> 
BR wait _mbxdone 


check err: . 
SRL RO, #1, RO 
BLBS RQ, read_err 


MB 
LDQ RO, 32 (Rm) > Fetch RDATA 


read err: 
<handle error> 


wait _post_mbx: 
<backoff delay> 
BR post_mbx 


Notes: | 
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1. The mailbox is no longer in use by a bridge whenever the DON bit has been set 
by the servicing bridge or is newly allocated. 


2. The first barrier is required to ensure that the bridge will read the mailbox 
contents as updated by the processor. Any pending processor writes to the 
mailbox will have completed by the time that the load of the MBPR has 
completed. 


3. The second barrier is required to ensure that the processor will read the mailbox 
contents as updated by the bridge. The returned data is accessed only after the 
DON bit is observed to be set by the servicing bridge. 


4. Software need not wait for the DON bit to become set. 
5. The mailbox RDATA is valid only when DON is set and ERR is clear. 


8.3.8 Remote I/O Space Write Accesses 


The bridge need not return status for remote I/O space writes. When the mailbox 

DON bit is set by the bridge, the bridge has completed access to the mailbox 

structure. The ERR bit and STATUS fields are testable. The actual write operation 

need NOT have completed at the device and the ERR bit and STATUS fields can 

indicate success (be cleared) even though success is not ensured. However, the ERR 
' bit and STATUS fields, if set, do accurately report an error condition. 


The actual completion of a remote I/O space write access can only be observed 
indirectly. Either the appropriate device state must be read back, or the device must 
update a memory-resident data structure and/or request an interrupt. Remote I/O 
space read access(es) may be posted anytime after posting the write access. Because 
mailbox operations to the same remote bus are guaranteed to be ordered, the read 
is guaranteed to occur after the write. 


Errors encountered on remote I/O space write accesses are reported by bridge error 
interrupts. The bridge side which encounters the error requests the interrupt. Thus, 
a non-existent hose error may be reported by the local (PMI) side of the bridge, while 
a non-existent remote bus address error is reported by the remote (I/O bus) side of 
the bridge. 


Remote I/O space write accesses may be performed as follows: 


remote write: 


<load Rm with VA of mailbox> 

<ensure mailbox no longer in use by bridge> 

<derive and load mailbox CMD, MASK, HOSE, and RBADR fields> 
STQ R31, 40 (RM) ; Clear DON/ERR/STATUS 


MB 


post_mbx: 
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<derive PA of mailbox and load R1l> 
<derive VA of MBPR and load RO> 
STQ C R1,R0 

BEQ R1,wait_post_mbx 


_ wait post_mbx: 


<backoff delay> 
BR post_mbx 


Notes: 


1. The mailbox is no longer in use by a bridge whenever the DON bit has been set 
by the servicing bridge or is newly allocated. 


2. The barrier is required to ensure that the bridge will read the mailbox contents 
as updated by the processor. Any pending processor writes to the mailbox will 
have completed by the time that the load of the MBPR has completed. 


3. If the mailbox data is static, e.g. used to set a “go” bit in a device control 
register, the mailbox may be posted without regard to the state of the DON 
bit. Barriers are not required each time a static mailbox is posted, however a 
barrier is required after the mailbox contents are initialized and prior to its first 
use. 


8.4 Direct Memory Accesss (DMA) 


8.4.1 Access Granularity 


A device or bridge side access to a memory-like region, or “DMA”, is taken to be 
atomic when: 


e It is not possible for a single device read DMA of a data structure which is 
updated by a single processor write to observe a partial update of that structure. 


e Itis not possible for a processor reading a data structure which is updated by a 
single device write DMA to observe a partial update of that structure. 


A processor treats any memory-resident data structures which are shared with 
an I/O device as though the structures were shared with another processor. The 
processor must follow the guidelines given in Common Architecture, Chapter 5. 
Specifically, barrier instructions must be used: 


1. After updating a shared memory-resident data structure and before setting an 
associated flag indicating that the data structure is valid. 


2. After observing a newly updated flag, and prior to accessing the associated shared 
memory-resident data structure. 


The atomic DMA size guaranteed to a local device is a function of the PMI protocol. 


The minimum size is an aligned hexword. Locally connected devices must obey the 
PMI protocol and m ay participate in the memory cache coherency nolicy, See the 


guidelines in Common Architecture, Chapter 5. 
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The atomic DMA size guaranteed to a remote device is a function of the remote I/O 
bus protocol. Remote devices are guaranteed atomic access to aligned hexwords or 
the remote I/O bus transfer burst size, whichever is smaller. It is the responsibility 
of the local bridge side to ensure the atomicity of the device DMA. 


Larger atomic DMA granularity permits optimization of device control protocols. 
When a data structure and the associated flag are contained within a single aligned 
hexword, the device can update both simultaneously with a single write DMA. 
Similarly, the device may access both the data structure and the associated flag with 
a single read DMA. If the flag is valid, the data structure contains valid information; 
an additional read DMA is not necessary to obtain the valid data. 


HARDWARE/SOFTWARE IMPLEMENTATION NOTE 
The hexword write DMA size was chosen as the smallest 
cache block size of the first Alpha implementations 
\(Cobra and Flamingo)\ . 


8.4.2 Read/Write Ordering 


DMAs may be divided into the “control” stream and the “data” stream. These 
streams differ in their ordering properties. 


¢ Control stream accesses are guaranteed to be ordered. An implicit barrier occurs 
before and after each access. Control stream ordering must be preserved by all 
bridges between a given remote I/O device and processor memory. 


e Data stream DMAs may be arbitrarily reordered if permitted by the protocol of 
that I/O bus. No implicit barriers are associated with this stream. 


A device may use control stream DMAs to ensure ordering of the data stream DMAs 
and of interrupt requests as seen by a processor or other device sharing the same 
memory-resident structures. Data stream DMAs must not be reordered with respect 
to control stream DMAs. Interrupt requests must not be reordered with respect to 
control stream DMAs. 


Control stream DMAs must be used: 


e As the last DMA issued to update a memory-resident data structure before 
requesting a processor interrupt to notify the processor of the update. This DMA 
ensures that any previously issued data stream DMAs become visible to the 
processor prior to the interrupt. 


¢ To update any pointer or other linkage between memory-resident data structures. 
Consider a status buffer which is located by a status ring pointer. The status 
buffer may be updated with either a control or data stream DMA. The ring pointer | 
must be updated with a control stream DMA which is issued after the last DMA 
used to update the status buffer. 


A bridge must preserve the ordering of control stream DMAs regardless of whether 
_ the accesses are reads or writes. 
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The division of direct memory accesses into the control stream and the data stream is 
the responsibility of the device. I/O bus protocols which do not permit the separation 
of control and data stream DMAs must preserve the ordering of all DMAs and 
interrupt requests; all DMAs are considered to be control stream DMAs. Similarly, 
hose protocols which do not permit the separation of control and data stream DMAs 
must preserve the ordering of all DMAs and interrupt requests. 


Bridge implementations must guarantee forward progress on all DMA operations. 


8.4.3 Device Address Translation 


I/O devices use only physical addresses; devices must not access page tables for 
the purpose of address translation. Devices are independent of any virtual memory 
translation scheme and processor page size. 


8.5 Interrupts 


An interrupt request from an I/O device consists of an interrupt priority level and 
an interrupt vector. Device interrupt requests are defined to be priorities 20 to 23. 
The interrupt vector identifies the appropriate interrupt service routine; the starting 
address of the interrupt service routine is obtained by using the vector as an offset 
from the base of the System Control Block (SCB). 


All bridge implementations must maintain both the temporal order and relative 
priority of device interrupts. A bridge must not expedite a lower priority request if 
a higher priority request has been received. With one exception, a bridge must not 
reorder two interrupt requests at the same priority level. A bridge is permitted to 
expedite delivery of a fatal bridge error interrupt; this interrupt must be at IPL 23 
and may take precedence over any IPL 23 device interrupts. | 


A bridge may prefetch the interrupt vector from an I/O device to reduce the processor 
overhead associated with interrupt dispatch. Vector prefetch reduces the processor 
latency necessary to dispatch to the interrupt service routine by reducing the delay 
associated with the delivery of the interrupt vector to the processor. 


When a bridge delivers an interrupt from an I/O device, any pending control stream | 
DMA writes issued by the device must have become visible to the processors. Note 
that due to the ordering of control stream DMAs, any data stream DMAs writes 
prior to the last pending control stream DMA must also have become visible to the 
processors. 


In multi-processor configurations, interrupts may be directed to a subset of the 
processors in the configuration. Such targetting is implementation specific. \See 
Section 8.8.\ 


8.6 I/O Bus-Specific Mailbox Usage 


\Send mail to EAGLE1::ALPHA_SRM to register a new Alpha system or bridge 
side. \ 
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8. 6.1 Mailbox Field Checking 


Bridge sides check only implemented functions. It is the responsibility of the posting 
software to ensure that the mailbox data structure fields are valid and that the 
structure is posted correctly. 


1. Local sides need not check the MASK, B, CMD, RBADR, or WDATA fields. 
2. Local sides which connect to a single hose need not check the HOSE field. 

3. Local sides need not pass the HOSE or W fields to the remote bridge side. 
4 


. Remote bridge sides which do not implement masking need not check the MASK 
field. 


5. There is no consistency checking between the W and CMD fields. If the W 
bit is set and the CMD field indicates a read, the result is UNPREDICTABLE. 
Similarly, if the W bit is clear and the CMD field indicates a write, the result is 
UNPREDICTABLE. 


6. Remote bridge sides check only implemented CMD and RBADR bits. 


8.6.2 CMD Field | 
The CMD field consists of two subfields: 
¢ A remote I/O bus specific subfield. 


This subfield ; is common to all Alpha systems and contains the controls for a given 
remote bus. The common subfield must be backward compatible; all systems 
which connect to a given I/O bus share this subfield. 


e A system-specific subfield. 


This subfield is specific to each Alpha system and contains the controls for a 
given bridge implementation or system-specific diagnostic functions. 


The size of each is specific to the remote I/O bus. The bridge specification must 
include the definitions of all valid commands. This partition promotes software 
portability. A given device driver uses the same CMD for a given type of device 
access, regardless of the platform. Diagnostic software can also interpret the 
common field without regard to the platform on which the mailbox was posted. 


8.6.3 Special Commands 


The special “WHO_ARE_YOU” command (W=0, B=1, CMD=0) is common to all 
bridge implementations. WHO_ARE_YOU is used to determine the type of remote 
bridge side. In response to a mailbox operation with a WHO_ARE_YOU command 
and RBADR of 0, the remote bridge side returns a unique remote bus side identifier. 
All other commands are specific to the type of remote bus and independent of the 
bridge implementation. 
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8.7 \lmplementation Considerations | 
ee fe 1 Mailbox Selection 


The choice of direct or eetives access (local or r remote I/O space) should be made 
after consideration of the following: 


¢ The processor overhead associated with waiting for the return data. 


e The occupancy of the processor-memory interconnect during the access to an I/O 
location on an I/O bus. 


¢ The performance of the device. 
e The complexity of the logic required te implement. 
¢ The software impact. 


The direct access method, with or without associated address mapping registers, is 
subject to the following problems on Alpha systems: 


1. Access Delay. 


The I/O bus and device are typically much slower than the processor-memory 
interconnect and the processor. 


2. Access Granularity. 


The Alpha instruction set supports only aligned auadwond and longword accesses. 
Many I/O devices and buses require accesses that span less than four bytes; full 
longword accesses can generate undesired side effects. 


3. Address Granularity. 


Alpha processors may have caches leading to designs which perform reads and 
writes to naturally aligned cache blocks. The length of a cache block is usually 
greater than a quadword. For memory accesses, the processor need never issue 
the lower address bits. Additional hardware costs would be incurred to enable 
the processor to access arbitrarily aligned longwords. 


4. Physical Address Size 


Many I/O buses now have address spaces that exceed the Alpha address space. 
High performance systems need multiples of such buses. It is no longer feasible 
to compress or fold the 0 bus address space into a portion of the processor I/O 
space. 


The mailbox access method addresses the above problems, but has other 
disadvantages. Foremost are: 


¢ Much software has been written to perform direct (mapped) access. 


Such software must be modified to use mailbox access. Mapped I/O accesses will 
be compiled to longword or quadword accesses, since an Alpha compiler cannot 
know that any particular access is to remote I/O space. Furthermore, the LDx 
accesses may be reordered from the data usage. As such, it is not simply possible 
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to formulate an exception-based mechanism to transparently trap and handle I/O 
space accesses. The exception handler would have to have detailed knowledge of 
the device accessed to be able to resolve the appropriate access granularity. 


¢ A mailbox operation access may require more memory accesses and processor. 
instructions than a direct access. 


The significance of this factor depends on the relative access latencies of remote 
I/O space and memory. If the remote I/O space access latency is significantly 
longer, the effective overhead of a mailbox access will be no more than a direct 
access. If the remote I/O space access latency is on the order of the memory 
access latency, the mailbox access overhead may be significant. 


For devices which require very fast or very frequent I/O space accesses, e.g. 
frame buffers, mailbox accesses can be expected to give unacceptable system 
performance. Additional hardware such as a companion DMA engine or attached 
local processor must be coupled to the device. 


To promote portability, software should be written to accommodate a bridge. It is 
recommended that ALL I/O location reads and writes are made through subroutines. 
Parameters to these routines should include all the fields necessary to use a mailbox, 
see Section 8.3.3. 


8.7.2 Mailbox Pointer Register Flow Control Selection 


Each Mailbox Pointer (MBPR) register represents a resource to the processor. Hither 
that resource must appear to be infinite, or a flow control mechanism is necessary. 


The MBPR resources appear to be infinite when, barring hardware errors, posting 
a mailbox access is guaranteed to succeed. A sufficiently deep FIFO structure 
implemented behind the MBPR register could appear infinite. The depth of the 
FIFO will be a function of the number of I/O devices to be supported and the access 
characteristics of those devices. A hardware mechanism for backoff-retry access to 
the MBPR incorporated in the PMI protocol could also provide such a guarantee. 


A flow control mechanism for MBPR register accesses must be atomic. The MBPR is 
accessed by code threads which execute at multiple IPLs. A single software MBPR 
ownership flag would lead to priority inversion and/or deadlock. A higher IPL code | 
thread executing on one processor will block if the flag is owned by a lower IPL code 
thread executing on a different processor. 


The MBPR register access flow control mechanism should not add significant | 
overhead to critical code paths. Performing MBPR accesses only at IPL 31 or via 
dedicated PALcode can have significant system performance implications. Statically 
allocating some number of MBPR resources (FIFO entries) per IPL and/or per 
processor requires that the software thread determine the IPL/processor execution 
environment. Note that such static allocation schemes are not guaranteed to be 
portable between Alpha systems. 


The first Alpha implementations use a single STQ_C instruction and the 
associated lock_flag to implement MBPR register access flow control. This is an_ 
implementation choice and not architected. Subsequent implementations may select 
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other mechanisms, Pao since this use of STQ_C may have performance 
implications. 7 


IMPLEMENTATION NOTE 
As an example, consider a processor with virtual caches. 
Virtual address translation would be required on all 
STQ_C instructions to differentiate the MBPR accesses 
from the memory accesses; the translation overhead 
would slow all STQ_C instructions. | 


8.7.3 Mailbox Starvation 


The MBPR register represents a shared system resource. Software which issues 
mailbox accesses should use that resource in a manner which guards against 
starvation or access lockout. 


Consider two software threads each issuing repeated mailbox accesses. There are 
three cases of interest: 


1. Each thread is executing on a unique processor in a multi-processor configuration. 
The bridge hardware implementation will provide fair MBPR access to each 
thread. Neither thread can cause the starvation of the other. 


2. Both threads are scheduled for execution at non-elevated IPL (IPL 0) on the 
same processor in a multi-processor configuration or on the only processor in 
a uni-processor configuration. The operating system software scheduling policy 
may provide fair MBPR access to each thread, or may allow either thread to 
cause the starvation of the other. 


3. \Both threads are scheduled for execution on the same processor in a multi- 
processor configuration or on the only processor in a uni-processor configuration 
and at least one of the threads is scheduled for execution at elevated IPL 
(IPL > 0). The thread which executes at the highest IPL can cause starvation 
of the thread executing at the lower IPL level. If both threads are scheduled to 
execute at the same IPL, either thread can cause starvation of the other. 


Software threads which execute at high IPL for extended periods can have severe 

system performance implications. Remote I/O space accesses are inherently slow 

with respect to processor speeds; remote I/O accesses can easily take in excess of 

1000 instructions. Software which spins at high IPL waiting for the DON bit or 

repeatedly posting mailbox accesses may execute for extended periods and cause 
' blockage of other event delivery. 


8.7.4 Mailbox Structure Synchronization Properties 


As explained in Section 8.3.4, the software and the servicing bridge may synchronize 
their accesses to the mailbox structure by using the DON bit. 


Bus bridge implementations may overwrite the full mailbox structure when setting 
the DON bit. The bridge may ‘perform a full 64-byte write to the mailbox structure 
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rather than a single quadword write or 32-byte write. If the bridge writes into the 
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first hexword, the original mailbox contents must be restored; the bridge must not 
cause the contents of the first hexword to be altered. 


Software must not alter the mailbox contents at any time after writing the MBPR 

and prior to observing the DON bit set. Any such changes may or may not be 
observed by the bridge. Any such changes may or may not be overwritten by the 
bridge. The resulting remote bus access and the resulting mailbox contents are 
UNPREDICTABLE. | 


Software may chose to ignore the DON bit if the contents of the mailbox structure 
are truly static. Software may post the same mailbox repeatedly. Bridge 
implementations must be able to correctly access the same mailbox in the event 
of back-to-back MBPR writes with the same mailbox address. Note that in this 
case, the contents of the DON, ERR, and STATUS fields are UNPREDICTABLE. 


8.7.5 I/O Device Properties 


Devices should be designed such that register accesses in the main code path can 
be retried with minimum knowledge of the nature of the device or the side effects of 
the access. Read accesses should not be used to signal a device to poll a command 
queue, increment a counter or pointer, or initiate an I/O operation. This permits the 

. software error recovery from transient errors to occur outside the main execution 
thread of the device driver. 


Device designs are strongly encouraged NOT to require reads from device registers 
during normal operation. Such reads can easily take in excess of 1000 instruction 
cycles and become a major performance impact in a very high speed system. 


Device designs are strongly encouraged NOT to require multiple back-to-back writes 
to device registers during normal operation. Such writes can lead to congestion at 
the MBPR, thus causing at least the issuing processor to wait. Such congestion can 
become a major performance impact in a very high speed system. 


The mailbox protocol does not provide any indication that a write has actually 
completed at the device. Device designs which use writes to registers to initiate 
device actions are strongly encouraged to include a mechanism in the control protocol 
to detect a lost signal or otherwise simply recover from a delayed notification. 


8.7.6 Implications of Memory Accesses by Devices 


Devices access memory for the exchange of command, status, and data with the 
processor. Repeated processor accesses to non-cached locations, even if the location 
is resident on the processor-memory interconnect, may have a negative performance 
impact in a very high speed system. Such accesses should be replaced with cacheable 
(e.g. memory) accesses wherever possible. 


Bridges and local devices may incorporate physical memory buffers and participate 
in the cache coherency policy. A bridge implementation which includes a cache may 
not permit hits under misses for control stream DMA reads. Such reordering would 
prohibit a device from issuing two back-to-back control stream DMA reads to access 
a single data structure since the cache hit could contain outdated data. 
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The dominant component of delay in a read DMA request by a remote I/O device 
-may be the memory access latency rather than the data transmission time. Fewer, 

larger, memory accesses are preferable to many small accesses. Also, write control 

stream DMAs to less than a full cache block may consume PMI resources if the 
bridge must do a read-modify-write. | 


The device control protocol data structures should be compact and naeaecly aligned. 

Note that this may require some memory-to-memory-copies by the processor. Small 
memory reads which must be serialized should be minimized; a common cause of 
such reads is when the device chases a collection of pointers. 


Device control protocols must NOT make use of memory interlocks. Devices are not 
guaranteed emulation of the VAX interlocked instructions such as INSQTI/REMQTI. 
Use of functionality equivalent to LDx_L/STx_C need not be supported by bridges 
and is not recommended for remote devices. 


8.7.7 Interrupts. 
| A device interrupt allows a device or bridge to signal processors for various reasons, 
often including the following: 
e Device solicitations for new I/O operations. 
¢ Operation completion. | 
° Availability of operation status. 


e Error occurrences. 


e Non-host-originated software-relevant changes in device or bridge state or 
identity. 


Device port protocols are strongly encouraged to minimize the use of interrupts, since 
interrupts have an expensive, and increasing, performance impact. The performance 
impact is due to many factors. Interrupts cause processor pipeline breaks and 
the execution of diverse short code threads which lower the effective cache and 
translation buffer hit rate. Instruction execution is slowed during the time required | 
to obtain the hardware interrupt vector. 


Interrupts in an Alpha system may target one or more processors. While multiple 
processors may respond, only one will actually transfer control to the interrupt 
service routine. 


Conceptually, for a device on an I/O bus, the interrupt protocol is: 


1. The device issues an interrupt request to the I/O module. The request specifies 
at least an interrupt level, corresponding to IPL 20 to 23. 


2. The bridge may prefetch the interrupt vector. This reduces the latency associated 
with the delivery to the responding processor. 


3. The bridge issues an interrupt request to some subset of the processors in the 
system. If the PMI protocol permits, the vector may be forwarded with the 
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4. When the IPL of an interrupted processor is lower than that of one or more 
outstanding interrupts, the processor will obtain a hardware interrupt vector 
if it does not already have one. The first processor to request a vector from a 
bridge or device will obtain the next pending vector. The “next pending” vector 
is determined by the IPL and time sequence order in which interrupts became 
pending at the bridge or device. The bridge or device does not reorder interrupts 
with the exception of a fatal bridge error interrupt; the latter occurs only at IPL 
23. | 


5. The processor obtaining the hardware interrupt vector uses it as an offset from 
the base of the System Control Block. The System Control Block element contains 
the software interrupt vector, which is the starting address of the interrupt 
service routine. The software interrupt vector is referred to as the interrupt 
vector in Part II, Operating Systems. The processor transfers control to this 
address. 


As a minimum, there should be no more than one interrupt on average for each 
operation carried out by the device. 


8.8 Targettable Interrupts 


In multi-processor configurations, interrupts may be directed, or targetted, to a 
subset of the processors in the configuration. The targetted subset may include 
one or more of the processors. Different interrupt sources, e.g. bridges, hoses, or 
devices, may be targetted to a different subset. Such targetting is implementation 
specific. 


Implementations which target interrupts must include mechanisms for handling the 
precedence of the bridge or device error interrupt.. When interrupts can be taken by 
one of many processors, an error interrupt may be taken by one processor while a 
success interrupt is taken by another processor. If the event which generated the 
error interrupt is related to the event which generated the success interrupt, the 
error interrupt must be fully serviced before the success interrupt can be serviced. 


As an example, consider a device which issues a control stream DMA write, then 
requests a completion (success) interrupt. If a bridge incurs an error on that DMA, 
the bridge may discard the DMA data and request an error interrupt. If the two 
interrupts are serviced simultaneously on two different processors, the software 
thread servicing the success interrupt may take incorrect action based on faulty 
(stale) data. The error condition must be evaluated prior to permitting the success 
code thread to execute. 


Input/Output (I) 8-19 





8.9 \Revision History: 
Revision 5.0, May 12, 1992 
1. Changed ’widget’ to device’ - 


2. Split chapter such that Sections 1.1 through the text part of 1.6.3 are now 
external Chapter 8 of the Common Section, Table 1-3 and all text/tables through 
1.6.3.2 (Futurebus+...) are placed in Appendix D, and 1.7 (Implementation 
Considerations) to end of chapter are internal (backslash) Chapter 8 of the 
Common Section 


Changed hex IPLs to decimal 

Made specified internal references external 

Added ECO #22 

Converted to SDML 

Made all unpredictable’ to UNPREDICTABLE’ 

Changed SLL to SRL under ’check error:’ in remote read psuedocode 
Removed all revision history prior to Rev 4.0, 29 March 1991 


CoN DP A PF & 


Revision 4.1, August 12, 1991 
1. Renumbered Chapter to #11 with inclusion of Console ECO #15 


Revision 4.0, March 29, 1991 
1. Inclusion in REV 4.0 of the SRM numbering to assume SRM version values 
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OpenVMS Alpha Software (Il) 


This section describes how the OpenVMS operating system relates to the Alpha architecture 
and contains the following chapters: 


e Chapter 1, Introduction to OpenVMS Alpha (ID) 

¢ Chapter 2, OpenVMS PALcode Instruction Descriptions (II) 

e Chapter 3, OpenVMS Memory Management (II) 

e Chapter 4, OpenVMS Process Structure (ID 

¢ Chapter 5, OpenVMS Internal Processor Registers, (IT) 

e Chapter 6, OpenVMS Exceptions, Interrupts, and Machine Checks (II) 





Chapter 1 Introduction to OpenVMS Alpha (II) 
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= | Chapter 1 
Introduction to OpenVMS Alpha (il) 


The goals of this design are to provide a hardware implementation independent 
interface between OpenVMS and the hardware. Further, the design provides the 
needed abstractions to minimize the impact between OpenVMS and the different 
hardware implementations. Finally, the design must contain only that overhead 
necessary to satisfy those requirements, while still supporting high-performance 
systems. 


1.1 Register Usage 


Besides those registers described in Part I, Common Architecture, OpenVMS defines 
the registers described in the following sections. 


1.1.1 Processor Status 


The Processor Status (PS) is a special register that contains the current status of the 
processor. It can be read by the CALL_PAL RD_PS instruction. The software field 
(PS<SW>) can be written by the CALL_PAL WR_PS_SW routine. See Chapter 6 for 
a description of the PS register. ) 


1.1.2 Stack Pointer (SP) 
Integer register R30 is the Stack Pointer (SP). 7 
The SP contains the address of the top of the stack in the current mode. 


Certain PALcode instructions, such as CALL_PAL REI, use R30 as an implicit 
operand. During such operations, the address value in R30, interpreted as an 
unsigned 64-bit integer, decreases (predecrements) when items are pushed onto the 
stack, and increases (postincrements) when they are popped from the stack. After 
pushing (writing) an item to the stack, SP points to that item. 


1.1.3 Internal Processor Registers (IPRs) 


The IPRs provide an architected mapping to internal hardware or provide other 
specialized uses. They are available only to privileged software through PALcode 
routines and allow OpenVMS to interrogate or modify system state. The IPRs are 
described in Chapter 5. 
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1.2 \Revision History 
Revision 1.0, May 12, 1992 
° Created for SRM Version 5 


e First review distribution 
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"Chapter 2 
OpenVMS PALcode Instruction Descriptions (Il) 


This chapter describes the PALcode ingenictions that are implemented for the 
OpenVMS Alpha environment. The PALcode instructions are a set of unprivileged 
and privileged CALL_PAL instructions that are used to match specific operate 
system requirements to the underlying hardware implementation. 


For example, privileged PALcode instructions switch the hardware context 
of a process structure. Unprivileged PALcode instructions implement the 
uninterruptable queue operations. Also, PALcode instructions provide standard 
interrupt and exception reporting mechanisms that are independent of the 
underlying hardware implementation. 


Table 2—1 lists all the unprivileged and privileged OpenVMS PALcode instructions 
and the section in this chapter in which they are described. 


Table 2-1: OpenVMS PALcode Instructions 
Unprivileged OpenVMS PALcode Instructions 


Mnemonic Operation Section 
AMOVRM Atomic move register/memory — Section 2.4 
AMOVRR Atomic move register/register Section 2.4 
BPT Breakpoint Section 2.1 
BUGCHK Bugcheck Section 2.1 
CHME Change mode to executive Section 2.1 
CHMK Change mode to kernel Section 2.1 
CHMS Change mode to supervisor Section 2.1 
CHMU Change mode to user Section 2.1 
GENTRAP Generate software trap Section 2.1 
IMB I-stream memory barrier Common Architecture, oe 
ter 6 

INSQxxx Insert in specified queue Section 2.3 
PROBER Probe read access | Section 2.1 
PROBEW Probe write access Section 2.1 
RD_PS Read processor status Section 2.1 
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Table 2-1 (Cont.): OpenVMS PALcode Instructions 
Unprivileged OpenVMS PALcode Instructions 


Mnemonic 


READ_UNQ 


REI 
REMQxxx 
RSCC 

_ SWASTEN 


WRITE_UNQ 
WR_PS_SW 


Operation 


Read unique context 

Return from exception or interrupt 
Remove from specified queue 

Read system cycle counter 

Swap AST enable 

Write unique context 


Write processor status software field 


Privileged OpenVMS PALcode Instructions 


Mnemonic 
CFLUSH 
DRAINA 


HALT 


LDQP 
MFPR 
MTPR 
STQP 
SWPCTX 


Operation 
Cache flush © 


Drain aborts 
Halt processor 


Load quadword physical 
Move from processor register 
Move to processor register 
Store quadword physical 
Swap privileged context . 
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Section 


Section 2.5 
Section 2.1 
Section 2.3 
Section 2.1 
Section 2.1 
Section 2.5 
Section 2.1 


Section 


Section 2.6 


Common Architecture, Chap- 
ter6 — 


Common Architecture, Chap- 


ter 6 

Section 2.6 
Section 2.6 
Section 2.6 


Section 2.6 
Section 2.6 





2.1 Unprivileged General OpenVMS PALcode Instructions 


The general unprivileged instructions in this section, together with those in Sections 
2.3, 2.4, and 2.5, provide support for the underlying OpenVMS Alpha model. 


- Table 2-2: Unprivileged General OpenVMS PALcode Instruction Summary 


Mnemonic 
BPT | 
BUGCHK 
CHME 
CHMK 
CHMS 
CHMU 
GENTRAP 
IMB 


PROBER 
PROBEW 
RD_PS 

REI 

RSCC 
SWASTEN 
WR_PS_SW 





Operation 


Breakpoint 

Bugcheck | 

Change mode to executive 

Change mode to kernel 

Change mode to supervisor 
Change mode to user 

Generate software trap 

I-stream memory barrier 

See Common Architecture, Chapter 6 
Probe read access 

Probe write access 

Read processor status 

Return from exception or interrupt 
Read system cycle counter 

Swap AST enable 


Write processor status software field 
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On Breakpoint 
Format: 


CALL PAL BPT | so 'PALcode format 


Operation: 


{initiate BPT exception with new_mode=kernel} 


Exceptions: 


Kernel Stack Not Valid Halt 


instruction Mnemonics: 


CALL_PAL BPT Breakpoint 
Description: 


The BPT instruction is provided for program debugging. It switches to Kernel mode 
and pushes R2..R7, the apdated PC, and PS on the Kernel stack. It then dispatches 
to the address in the Breakpoint SCB vector. See Section 6.3.3.2.1. 
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2.1.2 Bugcheck 


Format: 


CALL PAL BUGCHK - IPALcode format 


Operation: 


{initiate BUGCHK exception with new_mode=kernel } 


Exceptions: 


Kernel Stack Not Valid Halt 


Instruction Mnemonics: 


CALL_PAL BUGCHK  Bugcheck 
Description: 


The BUGCHK instruction is provided for error reporting. It switches to Kernel mode 
and pushes R2..R7, the updated PC, and PS on the Kernel stack. It then dispatches 
to the address in the Bugcheck SCB vector. See Section 6.3.3.2.2. 
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2.1.3 Change Mode Executive 


Format: 
CALL PAL CHME 'PALcode format 


Operation: 


tmp1l + MINU( 1, PS<CM>) 
{initiate CHME exception with new _mode=tmp1 } 


Exceptions: 


Kernel Stack Not Valid Halt 


Instruction Mnemonics: | 


CALL_PAL CHME Change Mode to Executive 
Description: 


The CHME instruction lets a process change its mode in a controlled manner. 


A change in mode also results in a change of stack pointers: the old pointer is saved, 
the new pointer is loaded. R2..R7, PC and PS are pushed onto the selected stack. 
The saved PC addresses the instruction following the CHME instruction. Registers 
R22, R23, R24, and R27 are available for use by PALcode as scratch registers. The 
contents of these registers are not preserved across a CHME. 
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2.1.4 Change Mode to Kernel 


Format: 

CALL_PAL CHMK | | !PALcode format 
Operation: 

{initiate CHMK exception with new mode=kernel} 
Exceptions: 


Kernel Stack Not Valid Halt 


Instruction Mnemonics: 


CALL_PAL CHMK Change Mode to Kernel 
Description: 


The CHMK instruction lets a process change its mode to kernel in a controlled 
manner. 


A change in mode also results in a change of stack pointers: the old pointer is saved, _ 
the new pointer is loaded. R2..R7, PC, and PS are pushed onto the kernel stack. 
The saved PC addresses the instruction following the CHMK instruction. Registers 
R22, R23, R24, and R27 are available for use by PALcode as scratch registers. The 
contents of these registers are not preserved across a CHMK. | 
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_ 2.1.5 Change Mode Supervisor 
Format: | 
-CALL_PAL CHMS | | !PALcode format 
Operation: 
tmpl «<- MINU( 2, PS<CM>) 
{initiate CHMS exception with new_mode=tmp1} 
Exceptions: 


Kernel Stack Not Valid Halt 


instruction Mnemonics: 


CALL_PAL CHMS Change Mode to Supervisor 


Description: 
The CHMS instruction‘lets a process change its mode in a controlled manner. 
A change in mode also results in a change of stack pointers: the old pointer is saved, 


the new pointer is loaded. R2..R7, PC, and PS are pushed onto the selected stack. 
The saved PC addresses the instruction following the CHMS instruction. 
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2.1.6 Change Mode User 


Format: 


CALL_PAL CHMU !PALcode format 


Operation: 


{initiate CHMU exception with new_mode=PS<CM>} 
Exceptions: 
Kernel Stack Not Valid Halt 


Instruction Mnemonics: 


CALL_PAL CHMU Change Mode to User 
Description: 
The CHMU instruction lets a process call a routine via the change mode mechanism. 


R2..R7, PC, and PS are pushed onto the current stack. The saved PC addresses the 
instruction following the CHMU instruction. 


The CALL_PAL CHMU instruction is provided for VAX compatibility only. 
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- 2.1.7 Generate Software Trap 


_ Format: | | 
CALL. PAL GENTRAP | {PALcode format 


Operation: 
{initiate GENTRAP exception with new_mode=kernel} 


! R16 contains the value encoding of the software trap 


Exceptions: 


- Kernel Stack Not Valid Halt 


Instruction Mnemonics: 


CALL_PAL GENTRAP Generate Software Trap 


‘Description: 


The GENTRAP instruction is provided for reporting runtime software conditions. It 
switches to Kernel mode, and pushes R2...R7, the updated PC and PS on the Kernel 
stack. It then dispatches to the address in the GENTRAP SCB Vector. See Section 
Section 6.6. | 


The value in R16 identifies the particular software condition that has occurred. The 
encoding for the software trap values is given in the software calling standard for 
the system. 
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2.1.8 Probe Memory Access 
Format: 


CALL_PAL PROBE | !PALcode format — 


Operation: 


{ R16 contains the base address 

! R17 contains the signed offset 

! R18 contains the access mode 

' RO receives the completion status 
! — 1 aif success 

! — 03aif failure 


first «+ R16 
last + ={R16+R17} 


IF R18<1:0> GTU PS<CM> THEN 
probe mode + R18<1:0> 
ELSE 
probe mode + PS<CM>) 


IF ACCESS (first, probe_mode) AND ACCESS (last, probe_mode) THEN 


RO + 1 
ELSE 
RO + Q 
Exceptions: 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL PROBER Probe for Read Access 
CALL_PAL PROBEW _ Probe for Write Access 


Description: 


The PROBE instruction checks the read or write accessibility of the first and last 
byte specified by the base address and the signed offset; the bytes in between are 
not checked. 


System software must check all pages between the two bytes if they are to be 
accessed. If both bytes are accessible, PROBE returns the value 1 in RQ; otherwise, 
PROBE returns 0. The Fault On Read and Fault On Write PTE bits are not checked. 
A Translation Not Valid exception is signaled only if the the mapping structures can 
not be accessed. A Translation Not Valid exception is signaled only if the first or 
second level PTE is invalid. 
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The protection is checked against the less privileged of the modes specified by — 
R18<1:0> and the Current Mode (PS<CM>). See Section 6.2 for access mode 
encodings. | | 7 8 Bt x | 


PROBE is only intended to check a single datum for accessibility. It does not check | 
all intervening pages because this could result in excessive interrupt latency. _ 
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2.1.9 Read Processor Status 
Format: 
CALL_PAL RD_PS - !PALcode format 


Operation: 


RO «+ PS 
Exceptions: 
None 


Instruction Mnemonics: 


CALL_PAL RD_PS Read Processor Status 


Description: 


The RD_PS instruction returns the Processor Status (PS) in register RO. The 
Processor Status is described in Section 6.2. The PS<SP_ALIGN> field is always 
a zero on a RD _ PS. | 
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2.1.10 Return from Exception or Interrupt 


Format: = 
CALL_PAL REI oo _ {PAL code format 


Operation: 
! See Chapter 6 | 
{ for information on interrupted registers 


IF SP<5:0> NE 0 THEN 
{illegal operand } 


tmpl «- (SP) ! Get saved RZ 
tmp2 + . (SP+8) ! Get saved R3 
tmp3 «+ (SP+16) ! Get saved R4 
tmp4 + (SP+24) ! Get saved R5 
tmp5 «+ (SP+32) ! Get saved R6 
tmp6 «<— (SP+40) ! Get saved R7 
tmp7 -— (SP+48) ! Get new PC 

tmp8 « (SP+56) ! Get new PS 


Copy new ps 

Clear cm field 

Clear sp align field 

Clear Software Field 

Clear except/inter/mcheck flag 


ps_chk «+ tmp8 
ps_chk<cm> «- 0 
ps_chk<sp align> + 0 
ps_ chk<sw> -— 0 

intr flag + 0 

{ clear lock_flag} 


oum ou om 298 —-/? 


! If current mode is not kernel check the new ps is valid. 
IF {ps<cm> NE 0} AND 
{{tmp8<cm> LT ps<cm>} OR {ps_chk NE 0}} THEN 
BEGIN 
{illegal operand} 
END 


sp + {sp + 8*8} OR tmp8<sp_align> 
IF {internal registers for stack pointers} THEN 
CASE ps<cm> BEGIN 
[O]: ipr_ksp + sp 
[1]: ipr_esp + sp 
[2]: ipr_ssp + sp 
[3]: ipr_usp + sp 
ENDCASE : 
CASE tmp8<cm> BEGIN 
[0]: sp + ipr_ksp 


[l]: sp + ipr_esp 

[2]: sp + ipr_ssp 

[3]: sp «+ ipr_usp 
ENDCASE 


ELSE 
(pcbb + 8*ps<cm>) + sp 
sp + (pebb + 8*tmp8<cm>) 
ENDIF 
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R2 + tmpl 
R3 < tmp2 
R4 <— tmp3 
R5 «+ tmp4 
R6 <— tmp5 
R7 <— tmp6 
PC + tmp7 
PS + tmp8 <12:00> 


{Initiate interrupts or AST interrupts that are now pending} 


Exceptions: 


Access Violation 

Fault on Read 

Illegal Operand 

Kernel Stack Not Valid Halt — 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL REI Return from Exception or Interrupt 


Description: 


The REI instruction pops the PS, PC, and saved R2...R7 from the current stack and 
holds them in temporary registers. | 


The new PS is checked for validity and consistency. If it is invalid or inconsistent, 
an illegal operand exception occurs; otherwise the operation continues. A kernel 
to nonkernel REI with a new PS<IPL> not equal to zero may yield UNDEFINED 
results. 


The current stack pointer is then saved and a new stack pointer is selected according 
to the new PS<CM> field. R2 through R7 are restored using the saved values held in 
the temporary registers. A check is made to determine if an AST or other interrupt 
is pending (see Section 6.7.6). 


If the enabling conditions are present for an interrupt or AST interrupt at the 
completion of this instruction, the interrupt or AST interrupt occurs before the next 
instruction. 
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When an REI is issued, the current stack must be writable from the current mode or an 
Access Violation may occur. 3 


IMPLEMENTATION NOTE 
This is necessary so that an implementation can choose 
to clear the lock flag by doing a STx_C to above the top- 
of-stack after popping PS, PC, and saved R2. .R7 off the 
the current stack. 
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2.1.11 Read System Cycle Counter 


Format: 


CALL_PAL RSCC !PALcode format 


Operation: 


RO «+ {System Cycle Counter} 


Exceptions: 


None 


Instruction Mnemonics: 


CALL_PAL RSCC Read System Cycle Counter 


Description: 


The RSCC instruction writes register RO with the value of the system cycle counter. 
This counter is an unsigned 64-bit integer that increments at the same rate as the 
process cycle counter. The cycle counter frequency, which is the number of times 

_ the system cycle counter gets incremented per second rounded to a 64-bit integer, is 
given in the HWRPB. \ (See Platform Section, Chapter 3). \ 


The system cycle counter is suitable for timing a general range of intervals to within 
10% error and may be used for detailed performance characterization. It is required 
on all implementations. SCC is required for every processor, and each processor in 
a multiprocessor system has its own private, independent SCC. 


| Notes: 
1. Processor initialization starts the SCC at 0. 


2. SCCis required for every processor and each processor in a multiprocessor system 
has its own private, independent SCC. 


3. SCC is monotonically increasing. On the same processor, the values returned 
by two successive reads of SCC must either be equal or the value of the second 
must be greater (unsigned) than the first. 


4. SCC ticks are never lost so long as the SCC is accessed at least once per each PCC 
overflow period (2**32 PCC increments) during periods when the hardware clock 
interrupt remains blocked. The hardware clock interrupt is blocked whenever 
the IPL is at or above CLOCK_IPL or whenever the processor enters console I/O 
mode from program I/O mode. 
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5. The 64-bit SCC. may be constructed from the 32-bit PCC hardware counter-and 
a 32-bit PALcode software counter. As part of the hardware clock interrupt 
processing, PALcode increments the software counter whenever a PCC wrap is — 
detected. Thus, SCC ticks may be lost only when PALcode fails to detect PCC 
wraps. In a machine where the PCC is incremented at a 1 nsec rate, this may 
occur when hardware clock interrupts are blocked for greater than 4 seconds. 


6. An implementation-dependent mechanism must exist to, when enabled, cause 
the RSCC instruction, as implemented by standard PALcode, to always return 
a zero in RO. This mechanism must be usable by privileged system software. A 
similar mechanism must exist for RPCC. Implementations are allowed to have 
just a single mechanism which when enabled causes both RSCC and RPCC to 
return zero. 
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2.1.12 Swap AST Enable 
Format: 
CALL PAL SWASTEN | !PALcode format 


Operation: 


RQ «+ ZEXT (ASTEN<PS<CM>>) 
ASTEN<PS<CM>> +- R16<Q> 


{check for pending ASTs} 


Exceptions: 


None 


Instruction Mnemonics: | 


CALL_PAL SWASTEN Swap AST Enable for Current Mode 


Description: 


The SWASTEN instruction swaps the AST enable bit for the current mode. The 
new state for the enable bit is supplied in register R16<0> and previous state of the 
enable bit is returned, zero extended, in RO. 


A check is made to determine if an AST interrupt is pending (see Section 6.7.6.6). 


If the enabling conditions are present for an AST interrupt at the completion of this 
instruction, the AST occurs before the next instruction. 
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2.1.13 Write Processor Status Software Field 


Format: 


CALL_PAL WR_PS_SW !PALcode format 


Operation: 


PS<SW> +- R16<1:0> 


Exceptions: 


None 


Instruction Mnemonics: 


CALL_PAL WR_PS_SW Write Processor Status Software Field 


Description: 


The WR_PS_SW instruction writes the Processor Status software field (PS<SW>) 
with the low order two bits of R16. The Processor Status is described in Section 6.2. 
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2.2 OpenVMS Alpha Queue Data Types 


The following sections describe the queue data types that are manipulated by the 
OpenVMS queue PALcode. Section 2.3 describes the PALcode instructions that 
perform the manipulation. 


2.2.1 Absolute Longword Queues 


A longword queue is a circular, doubly linked list. A longword queue entry is specified 
by its address. Each longword queue entry is linked to the next with a pair of 
longwords. A queue is classified by the type of link it uses. Absolute longword 
queues use absolute addresses as links. 


The first (lowest addressed) longword is the forward link; it specifies the address of 
the succeeding longword queue entry. The second (highest addressed) longword is 
the backward link; it specifies the address of the preceding longword queue entry. 


A longword queue is specified by a longword queue header which is identical to a 
pair of longword queue linkage longwords. The forward link of the header is the 
address of the entry termed the head of the longword queue. The backward link of 
the header is the address of the entry termed the tail of the longword queue. The 
forward link of the tail points to the header. 


An empty longword queue is specified by its header at address H, as shown in 
Figure 2—1 If an entry at address B is inserted into an empty longword queue (at 
either the head or tail), the longword queue shown in Figure 2—2 results. Figures 
2-3, 2-4, and 2—5, respectively, illustrate the results of subsequent insertion of an 
entry at address A at the head, insertion of an entry at address C at the tail, and 
removal of the entry at address B. 


2.2.2 Self-Relative Longword Queues 


Self-relative longword queues use displacements from longword queue entries as 
links. Longword queue entries are linked by a pair of longwords. The first longword 
(lowest addressed) is the forward link; it is a displacement of the succeeding longword 
queue entry from the present entry. The second longword (highest addressed) is the 
backward link; it is the displacement of the preceding longword queue entry from 
the present entry. A longword queue is specified by a longword queue header, which 
also consists of two longword links. | 


_ An empty longword queue is specified by its header at address H. Since the longword 
queue is empty, the self-relative links are zero, as shown in Figure 2-6. 


Four types of operations can be performed on self-relative queues: insert at head, 
insert at tail, remove from head, and remove from tail. Furthermore, these 
operations are interlocked to allow cooperating processes in a multiprocessor system 
to access a shared list without additional synchronization. A hardware-supported, 
interlocked memory access mechanism is used to modify the queue header. Bit <0> 
of the queue header is used as a secondary interlock and is set when the queue is 
being accessed. | 
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If an interlocked queue CALL_PAL instruction encounters the secondary interlock 
set, then, in the absence of exceptions, it terminates after setting RO to —1 to indicate | 
failure to gain access to the queue. If the secondary interlock bit is not set, then 
it is set during the interlocked queue operation and is cleared upon completion of 
the operation. This prevents other interlocked queue CALL_PAL instructions from 
operating on the same queue. 


If both the secondary interlock is set and an exception condition occurs, it is 
UNPREDICTABLE whether the exception will be reported. 


Figures 2-7, 2—8, and 2—9, respectively, illustrate the results of subsequent insertion | 
of an entry at address B at the head, insertion of an entry at address A at the tail, 
and insertion of an entry at address C at the tail. 


Figures 2—9, 2—8, and 2—7 (in that order) illustrate the effect of removal at the tail 
and removal at the head. 


Figure 2-1: Empty Absolute Longword Queue 
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Figure 2-3: Absolute Longword Queue with Two Entries 
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Figure 2-4: Absolute Longword Queue with Three Entries 


o 
pers 
° 


[H+4 


‘A+4 


'B+4 


'C+4 


OpenVMS PALcode Instruction Descriptions (Il) 2-23 


Distribution 





Restricted 





2-24 OpenVMS Alpha Software (Il) 





Nstri b 





Digital Restricted ution 


Figure 2-8: Self-Relative Longword Queue with Two Entries 
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Figure 2-9: Self-Relative Longword Queue with Three Entries 
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2.2.3 Absolute Quadword Queues 


_ A quadword queue is a circular, doubly linked list. A quadword queue entry is 
specified by its address. Each quadword queue entry is linked to the next with 

a pair of quadwords. A queue is classified by the type of link it uses. Absolute 
quadword queues use absolute addresses as links. 


The first (lowest addressed) quadword is the forward link; it specifies the address of 
the succeeding quadword queue entry. The second (highest addressed) quadword is 
the backward link; it specifies the address of the preceding quadword queue entry. 


A quadword queue is specified by a quadword queue header which is identical to a 
pair of quadword queue linkage quadwords. The forward link of the header is the 
address of the entry termed the head of the quadword queue. The backward link of 
the header is the address of the entry termed the tail of the aaa queue. The 
forward link of the tail points to the header. 
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An empty quadword queue is specified by its header at address H, as shown in | 


Figure 2—10. If an entry at address B is inserted into an empty quadword queue (at 
either the head or tail), the quadword queue shown in Figure 2—11 results. Figures | 


2-12, 2-18, and 2-14, respectively, illustrate the results of subsequent insertion of 


an entry at address A at the head, insertion et an entry. at address C at the tail, and - 
removal of the entry at address B. 


2.2. 4 Self-Relative Quadword Queues 


2-26 


Self-relative quadword queues use displacements from quadword queue entries 


as links. Quadword queue entries are linked by a pair of quadwords. The 


first quadword (lowest addressed) is the forward link; it is a displacement of the 
succeeding quadword queue entry from the present entry. The second quadword 
(highest addressed) is the backward link; it is the displacement of the preceding 
quadword queue entry from the present entry. A quadword queue is specified by a 
quadword queue header, which also consists of two quadword links. 


An empty quadword queue is specified by its header at address H. Since the 


quadword queue is empty, the self-relative links are zero, as shown in Figure 2-15. 


Four types of operations can be performed on self-relative queues: insert at head, 
insert at tail, remove from head, and remove from tail. Furthermore, these 
operations are interlocked to allow cooperating processes in a multiprocessor system 
to access a shared list without additional synchronization. A hardware-supported, 
interlocked memory access mechanism is used to modify the queue header. Bit <0> 
of the queue header is used as a secondary interlock and is set when the queue is 
being accessed. 


If an interlocked queue CALL_PAL instruction encounters the secondary interlock 
set, then, in the absence of exceptions, it terminates after setting RO to —1 to indicate 
failure to gain access to the queue. If the secondary interlock bit is not set, then 
it is set during the interlocked queue operation and is cleared upon completion of 
the operation. This prevents other interlocked queue CALL_PAL instructions from 
operating on the same queue. 


If both the secondary interlock is set and an exception condition occurs, it is 
UNPREDICTABLE whether the exception will be reported. 


Figures 2—16, 2-17, and 2-18, respectively, illustrate the results of subsequent 
insertion of an entry at address B at the head, insertion of an entry at address 
A at the tail, and insertion of an entry at address C at the tail. 


Figures 2-18, 2-17, and 2~16, (in that order) illustrate the effect of removal at the 
tail and removal at the head. 
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a Figure 2-10: Empty Absolute Quadword Queue 
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Figure 2—11: Absolute Quadword Queue with One Entry 


Figure 2-12: Absolute Quadword Queue with Two Entries 
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Figure 2-13: Absolute Quadword Queue with Three Entries 





Figure 2-15: Empty Self-Relative Quadword Queue 
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Figure 2-16: Absolute Quadword Queue with One Entry | 


Figure 2—17: Self-Relative Quadword Queue with Two Entries 
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2. 3 Unprivileged OpenVMS Queue PALcode Instructions 


2-30 


The following unprivileged PALcode instructions perform atomic modification of the 
queue data types that are described in Section 2.2. | 


Table 2-3: VAX Queue Palcode Instruction Summary 


Mnemonic 
INSQHIL 
INSQHILR 
INSQHIQ 
INSQHIQR 
INSQTIL 
INSQTILR 
INSQTIQ 
INSQTIQR 
INSQUEL 
INSQUEQ 
REMQHIL 


REMQHILR 


REMQHIQ 


REMQHIQR 
-REMQTIL 


REMQTILR 
REMQTIQ 


REMQTIQR 


REMQUEL 
REMQUEQ 


Operation 


- Insert into longword queue at head, interlocked 


Insert into longword queue at head, interlocked, resident 
Insert into quadword queue at head, interlocked 

Insert into quadword queue at head, interlocked, resident 
Insert into longword queue at tail, interlocked 

Insert into longword queue at tail, interlocked, resident 
Insert into quadword queue at tail, interlocked 

Insert into quadword queue at tail, interlocked, resident 
Insert into longword queue 

Insert into quadword queue 

Remove from longword queue at head, interlocked 

Remove from longword queue at head, interlocked, resident 
Remove from quadword queue at head, interlocked 

Remove from quadword queue at head, interlocked, resident 
Remove from Jongword queue at tail, interlocked 

Remove from longword queue at tail, interlocked, resident 
Remove from quadword queue at tail, interlocked 

Remove from quadword queue at tail, interlocked, resident 
Remove from longword queue 


Remove from quadword queue 
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2.3.1 Insert Entry into Longword Queue at Head interlocked 


Format: 


CALL_PAL INSQHIL !PALcode format 


Operation: 


R16 contains the address of the queve header 
R17 contains the address of the new entry 
RO receives status: | 
-1 if the secondary interlock was set 
0 if the queue was not empty before adding this entry 
1 if the queue was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 
that the header and entry not same location and 
that the header and entry are valid 32 bit addresses 


o—_ sve tom t—-_— O=-@ —-> —-_> cam om oom Cam Oo om 2 


IF {R16<2:0> NE 0} OR {R17<2:0> NE 0} OR {R16 EQ R17} OR 
{SEXT (R16<31:0>) NE R16} OR {SEXT(R17<31:0>) NE R17} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount } ! Implementation-specific 
REPEAT | 
LOAD _ LOCKED (tmp0 + (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «+ -1, {return} ! Already set 


done «- STORE CONDITIONAL ((R16) « {TMPO OR R1} ) 
N-—- Ne-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmpl + SEXT (tmp0<31:0>) 
IF {tmp1<2:1> NE 0} THEN BEGIN ! Check alignment 
BEGIN ! Release secondary interlock. 
(R16) < tmp0 
{illegal operand exception} 
END 


! Check if following addresses can be written 
! without causing a memory management exception: 
! entry 
! header + tmpl 
IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock. 
(R16) «- tmp0 
{initiate memory management fault} 
END 
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! All accesses can be done so enqueue the entry | 


tmp2 +- SEXT({R16 - R17}<31:0>). 


(R17)<31:0> -— tmpl + tmp2 | ! Forward link 

(R17 + 4)<31:0> -— tmp2 —, | ! Backward link 

(R16 + tmpl + 4)<31:0> -— -tmpl - tmp2 ! Successor back link 
MB » . 


(R16) <31:0> «+— -tmp2 ! Forward link of header 
| 7 Release lock 


IF tmpl EQ O THEN 


RO +- 1 | ! Queue was empty 
ELSE 
RO + 0O ! Queue was not empty 
END | 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Illegal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL INSQHIL Insert into Longword Queue at Head Interlocked - 


Description: 


If the secondary interlock is clear, INSQHIL inserts the entry specified in R17 into 
the self-relative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1; else it is set to 
a 0. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. Before the insertion, the 
processor validates that the entire operation can be completed. This ensures that if 
a memory management exception occurs, the queue is left in a consistent state (see 
Chapters 3 and 6). If the instruction fails to acquire the secondary interlock after 
"N" retry attempts, then (in the absence of exceptions) R< 0> is set to a-—l. The 
value "N" is implementation dependent. \ The selected initial value of N is 20.\ 
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2.3.2 Insert Entry into Longword Queue at Head Interlocked Resident 


Format: 


CALL_PAL INSQHILR !PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-l if the secondary interlock was set 
O if the queue was not empty before adding this entry 
1 if the queue was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 

All parts of the Queue must be memory resident 


-- o—- ——- = 7 —- o—™ —-> + om o—- 


N <- {retry amount} ! Implementation-specific 
REPEAT | 
LOAD LOCKED ({tmp0 + (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «+ -1, {return} ! Already set 


done + STORE CONDITIONAL ( (R16) +- {TMPO OR R1} ) 
N+ No-l 

UNTIL {done EQ 1} OR {N EQ QO} | 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmpl +— SEXT (tmp0<31:0>) 

tmp2 +«- SEXT({R16 - R17}<31:0>) ! Enqueue the entry 
(R17)<31:0> < tmpl + tmp2 ! Forward link of entry. 
(R17 + 4)<31:0> «- tmp2 | ! Backward link of entry. 
(RL6 + tmpl + 4)<31:0> + -tmpl - tmp2 ! Successor back link 


MB : 7 

(R16)<31:0> «— -tmp2 ! Forward link of header 
; ! Release the lock 

IF tmpl EQ O THEN | 


RO «- 1 ! Queue was empty 
ELSE | 
RO + 0 ! Queue was not empty 
END 
Exceptions: 


Illegal Operand 
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Instruction Mnemonics: 


CALL_PAL INSQHILR Insert Entry into Longword Queue 
at Head Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQHILR inserts the entry specified i in R17 into 
the self-relative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1; else it is set to 
a Q. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. If the instruction fails 
to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) R< 0> is set to a —1. The value "N" is implementation dependent. x The 
selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue 
header and elements are quadword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.3 Insert Entry into Quadword Queue at Head Interlocked 


Format: 


CALL_PAL INSQHIQ !PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-l1 if the secondary interlock was set 
QO if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


! Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 
that the header and entry not same location 
IF {R16<3:0> NE 0} OR {R17<3:0> NE 0} OR {R16 EQ R17} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation~-specific 
REPEAT 
LOAD LOCKED (tmp0O + (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «+ -1, {return} ! Already set 


done «+ STORE CONDITIONAL ((R16) + {TMPO OR R1} ) 
N- Ne-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


IF {tmp1<3:1> NE 0} THEN BEGIN ! Check Alignment 
BEGIN ! Release secondary interlock 
(R16) -— tmpl 
{illegal operand exception} 
END 


! Check if following addresses can be written 
! without causing a memory management exception: 
' entry 
! header + tmpl 
IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) «+ tmpl | 
{initiate memory management fault} 


END 
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! All accesses can be done so enqueue the entry 
tmp2 +- R16 - R17 


(R17) + .tmpl + tmp2 ! Forward link 

(R17 + 8) + tmp2 © | ! Backward link 

(R16 + tmpl + 8) « -tmpl —- tmp2 ! Successor back link 

MB 3 
(R16) -— -tmp2 . ! Forward link of header — 


! Release the lock. 
IF tmpl EQ 0 THEN 


RO -— 1 sf Queue was empty 
ELSE 
RO + 0 ! Queue was not empty 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
IHlegal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL INSQHIQ Insert into Quadword Queue at Head Interlocked 
Description: 


If the secondary interlock is clear, INSQHIQ inserts the entry specified in R17 into 
the self-relative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1; else it is set to 
a 0. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. Before the insertion, the 
processor validates that the entire operation can be completed. This ensures that if 
a memory management exception occurs, the queue is left in a consistent state (see 
Chapters 3 and 6). If the instruction fails to acquire the secondary interlock after 
"N" retry attempts, then (in the absence of exceptions) R< 0> is set to a—1. The 
value "N" is implementation dependent. \ The selected initial value of N is 20.\ 
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2.3.4 Insert Entry into Quadword Queue at Head Interlocked Resident 


Format: 


CALL_PAL INSQHIQR {PALcode format 


Operation: 


1! R16 contains the address of the queue header 

! R17 contains the address of the new entry 

! RO receives status: 

! -l1 if the secondary interlock was set 

: 0 if the entry was not empty before adding this entry 
! 1 if the entry was empty before adding this entry 
' 
{ 
! 
{ 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 

All parts of the Queve must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT ae 
LOAD LOCKED (tmpO +- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «- -l1l, {return} ! Already set 


done + STORE CONDITIONAL ((R16) < {TMPO OR R11} ) 
N+ N-l | 
UNTIL {done EQ 1} OR {N EQ QO} 
IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmp2 +«- R16 - R17 ! Enqueue the entry 

(R17) -— tmpl + tmp2 ! Forward link of entry. 
(R17 + 8) -— tmp2 ! Backward link of entry. 
(R16 + tmpi + 8) «< -tmpl - tmp2 ! Successor back link 


MB 

(R16) « -tmp2 ! Forward link of header, 
! Release the lock 

IF tmpl EQ 0 THEN 


RO - 1 ! Queue was empty 
ELSE . 
RO «- 0 ! Queue was not empty 
END 
Exceptions: 
Tegal Operand 
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Instruction Mnemonics: 


CALL_PAL INSQHIQR Insert Entry into Quadword Queue 
at Head Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQHIQR inserts the entry specified in R17 into 
the self_relative queue following the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1; else it is set to 
a Q. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. If the instruction fails 
to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) R< 0> is set to a-—1. The value "N” is implementation dependent. \ The 
selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue 
header and elements are octaword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.5 insert Entry into Longword Queue at Tail interlocked 


Format: 


CALL_PAL INSQTIL !PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-1 if the secondary interlock was set 
O if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 

that the header and entry not same location and 

that the header and entry are valid 32 bit addresses 

IF {R16<2:0> NE 0} OR {R17<2:0> NE 0} OR {R16 EQ R17} OR 
{SEXT (R16<31:0>) NE R16} OR {SEXT(R17<31:0>) NE R16} THEN 


Cd o—- —- t—-_ >_> t=? —/?> —/- —-? —- s—- —- ae > 
. 


BEGIN 
{illegal operand exception} 
END . 
N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpQ «+ (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO + --1, {return} f Already set | 


done + STORE CONDITIONAL ((R16) + {TMPO OR R1} ) 
N+- No-il 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmpi + SEXT(tmp0<31:0>) 
tmp2 + SEXT(tmp0<63:32>) 


IF {tmpl<2:1> NE 0} OR {tmp2<2:0> NE 0} THEN ! Check Alignment 
BEGIN ! Release secondary interlock 
(R16) -— tmp0 
{illegal operand exception} 
END 
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! Check if following addresses can be written 
f without causing a memory management exception: 
! entry 
! header + (header + 4) 
IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) + tmp0 : 
{initiate memory management fault} 
END 


! All Accesses can be done so enqueue entry 
tmp3 + SEXT( {R16 - R17}<31:0>) 


(R17)<31:0> -— tmp3 ! Forward link 
(R17 + 4)<31:0> -— tmp2 + tmp3 ! Backward link 
IF {tmp2 NE 0} THEN ! Forward link of predecessor 
(R16+tmp2)<31:0> «-— -tmp3 - tmp2 
ELSE 
tmpl + SEXT({-tmp3 - tmp2}<31:0>) 
(R16+4)<31:0> <-— -tmp3 ! Backward link of header 
MB 
(R16)<31:0> + tmpl ! Forward link, release lock 
IF tmpl EQ -tmp3 THEN 
RO +- 1 ! Queue was empty 
ELSE 
RO + 0 ! Queue was not empty 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL INSQTIL Insert into Longword Queue at Tail Interlocked 


Description: 


If the secondary interlock is clear, INSQTIL inserts the entry specified in R17 into 
the self-relative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1; else it is set to 
a 0. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. Before performing any 
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part of the operation, the processor validates that the insertion can be completed. 
This ensures that if a memory management exception occurs, the queue is left in 
a consistent state (see Chapters 3 and 6). If the instruction fails to acquire the 
secondary interlock after "N" retry attempts, then (in the absence of exceptions) R< 
O> is set to a—1. The value "N” is implementation dependent. \ The selected initial 
value of N is 20.\ 
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2.3.6 Insert Entry into Longword Queue at Tail interlocked Resident | | 


Format: 


CALL PAL INSQTILR {PAL code format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-1 if the secondary interlock was set 
0 if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry © 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
Header cannot be equal to entry. | 

All parts of the Queue must be memory resident 


o—m= * a —- ——- —™ —-/- o—w™ o—™ om ——- —/> 


N <- {retry amount} | ! Implementation~specific 
REPEAT | 
LOAD LOCKED (tmp0 + (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «- -1, {return} ! Already set 


done «- STORE CONDITIONAL ((R16) « {TMPO OR R1} ) 
N-—- N-l 

UNTIL {done EQ 1} OR {N EQ 0} | 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmp1l «+- SEXT (tmp0<31:0>) 
tmp2 +- SEXT (tmp0<63:32>) 
tmp3 +- SEXT( {R16 — R17}<31:0>) 


(R17) <31:0> -— tmp3 ! Forward link 

(R17 + 4)<31:0> — tmp2 + tmp3 ! Backward link 

IF {tmp2 NE 0} THEN ! Forward link of predecessor 
(R16+tmp2)<31:0> <— -tmp3 - tmp2 , 

ELSE | 
tmpl +- <= SEXT({-tmp3 - tmp2 }<31:0>) 

(R16+4)<31:0> + -tmp3 ! Backward link of header 

MB 

(R16) <31:0> «+ tmpl ! Forward link 


! Release the lock 
IF tmp1l EQ -tmp3 THEN 


RO + 1 ! Queue was empty 
ELSE 

RO + 0Q ! Queue was not empty 
END | 
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Exceptions: 
Illegal Operand 


Instruction Mnemonics: 


CALL_PAL INSQTILR Insert Entry into Longword Queue 
at Tail Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQTILR inserts the entry specified in R17 into > 
the self-relative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1; else it is set to 
a Q. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. If the instruction fails 
to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) R< 0> is set to a—1. The value "N" is implementation dependent. \ The 
selected initial value of N is 20.\ 


This instruction requires that the queue be memory resident and that the queue 
header and elements are quadword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3./ Insert Entry into Quadword Queue at Tail Interlocked 


Format: 


CALL_PAL INSQTIQ | _ [PALcode format 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
~1 if the secondary interlock was set 
0 if the entry was not empty before adding this entey 
1 if the entry was empty before adding this entry 


! Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 


check entry and header alignment and 
that the header and entry not same location 


! 
' 
{ 
' 
; 
! 
t 
| 
IF {R16<3:0> NE 0} OR {R17<3:0> NE 0} OR {R16 EQ R17} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry _. amount } ! Implementation~specific 
REPEAT 
LOAD LOCKED (tmp0 + (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RQ +- -1, {return} ! Already set 


done + STORE CONDITIONAL ((R16) <« {TMPO OR R11} ) 
N+ N-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 

tmp2 + (R16+8) 

IF {tmp1<3:1> NE 0} OR {tmp2<3:0> NE 0} THEN ! Check Alignment. 
BEGIN ! Release secondary interlock. 


(R16) «<- tmpl 
{illegal operand exception} — 
END 


! Check if following addresses can be written 
! without causing a memory management exception: 
! entry 
! header + (header + 8) 
IF {all memory accesses can NOT be-completed} THEN 
BEGIN ! Release secondary interlock. 
(R16) «< tmpl 
{initiate memory management fault} 
END 
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! All accesses can be done so enqueue the entry 
tmp3 «+ R16 - R17 


(R17) + tmp3 ! Forward link 

(R17 + 8) — tmp2 + tmp3 !. Backward link 

IF {tmp2 NE 0} THEN ! Forward link of predecessor 
(R16+tmp2) «<- -tmp3 - tmp2 

ELSE 
tmpl + {-tmp3 - tmp2} | 

(R16+8) +«- -tmp3 ! Backward link of header 

MB 

(R16) < tmpl ! Forward link 


! Release the lock 
IF tmpl EQ -tmp3 THEN 


RO — 1 | ! Queue was empty 
ELSE _ 
RO + 0 ! Queue was not empty 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL INSQTIQ Insert into Quadword Queue at Tail Interlocked 
Description: 


If the secondary interlock is clear, INSQTIQ inserts the entry specified in R17 into 
the self-relative queue preceding the header specified in R16. 


If the entry inserted was the first one in the queue, RO is set to a 1 else it is set to 
a Q. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. Before performing any 
part of the operation, the processor validates that the insertion can be completed. 
This ensures that if a memory management exception occurs, the queue is left in 
a consistent state (see Chapters 3 and 6). If the instruction fails to acquire the 
secondary interlock after "N" retry attempts, then (in the absence of exceptions) R< 
O> is set to a—1. The value "N" is implementation dependent. \ The selected initial 
value of N is 20.\ | 
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2.3.8 Insert Entry into Quadword Queue at Tail Interlocked Resident | 


Format: 


CALL_PAL INSQTIQR : : _ {PALcode format. 


Operation: 


R16 contains the address of the queue header 
R17 contains the address of the new entry 
RO receives status: 
-~1 if the secondary sneer eek was set. 
O if the entry was not empty before adding this entry 
1 if the entry was empty before adding this entry 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
Header cannot be equal to entry. 

All parts of the Queue must be memory resident 


ea om —- ——-: 2 ——_ —-_ o—-> o—— o——-: ——- 


N <~- {retry amount} ! Implementation-specific 
REPEAT | 
LOAD LOCKED (tmp0Q + (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO + -1, {return} ! Already set 


done <«< STORE _| CONDITIONAL ((R16) «< {TMPO OR R11} ) 
N+ N- 1 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + --1, {return} ! Retry exceeded 


MB 


tmp2 + (R1 6+8) 
tmp3 « R16 - R17 


(R17) <— tmp3 ! Forward link 
(R17 + 8) + tmp2 + tmp3 ! Backward link 
IF {tmp2 NE 0} THEN ! Forward link of predecessor 
(R16+tmp2) +- -tmp3 - tmp2 | 
ELSE | 
tmpl «+ {-tmp3 - tmp2} 7 
(R16+8) +- -tmp3 ! Backward link of header 
MB 
(R16) « tmpl ! Forward link and release the lock 
IF tmpl EQ -tmp3 THEN 
ROf- 1 ! Queue was empty 
ELSE | 
RO «- 0 ! Queue was not empty 
END 
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- Exceptions: 
Illegal Operand > 


Instruction Mnemonics: 


CALL_PAL INSQTIQR Insert Entry into Quadword Queue 
at Tail Interlocked Resident 


Description: 


If the secondary interlock is clear, INSQTIQR inserts the entry specified in R17 into 
the self_relative queue preceding the header specified in R16. | 


If the entry inserted was the first one in the queue, RO is set to a 1 else it is set to 
a 0. The insertion is a non-interruptible operation. The insertion is interlocked to 
prevent concurrent interlocked insertions or removals at the head or tail of the same 
queue by another process, in a multiprocessor environment. If the instruction fails 
to acquire the secondary interlock after "N" retry attempts, then (in the absence of 
exceptions) R< 0> is set to a—1. The value "N" is implementation dependent. \ The 
selected initial value of N is 20.\ | 


This instruction requires that the queue be memory resident and that the queue 
header and elements are octaword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.9 Insert Entry into Longword Queue 


Format: 


CALL_PAL INSQUEL -_ | {PALcode format | 


Operation: 


! R16 contains the address of the predecessor entry 

! or the 32 bit address of the 32 bit address of the 

! predecessor entry for INSQUEL/D 

! R17 contains the address of the new entry 

! RO receives status: 

: O if the queue was not empty before adding this entry 
! 1 if the queue was empty before adding this entry 

! ; 


! Must have write access to header and queue entries 
IF opcode EQ INSQUEL/D THEN 

tmp2 «+ SEXT( (R16) <31:0>) ! Address of predecessor 
ELSE 7 
tmp2 «- R16 


IF {all memory accesses can be completed} THEN 


BEGIN 
tmp<31:0> « SEXT((tmp2) <31:0>) ! Get Forward Link 
(R17)<31:0> — tmp ! Set forward link 
(R17 + 4)<31:0> — tmp2 ! Backward link 


(SEXT ( (tmp2)<31:0>) + 4)<31:0> -— R17 
! Backward link of Successor 


(tmp2)<31:0> <— R17 ! Forward link of Predecessor 
IF tmp EQ tmp2 THEN | 
RO <« 1 
ELSE 
RO + 0 
END 
ELSE 
BEGIN 
{initiate fault} 
END 
END 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Translation Not Valid 
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Instruction Mnemonics: 


7 CALL_PAL INSQUEL Insert Entry into Longword Queue 
CALL_PAL INSQUEL/D Insert Entry into Longword Queue Deferred 


Description: 


INSQUEL inserts the entry specified in R17 into the absolute queue following the 
entry specified by the predecessor addressed by R16. INSQUEL/D performs the 
same operation on the entry specified by the contents of the longword addressed by 
R16. | 


In either case, if the entry inserted was the first one in the queue, a 1 is returned in 
RO; otherwise a 0 is returned in RO. The insertion is a non-interruptible operation. 
Before performing any part of the insertion, the processor validates that the entire 
operation can be completed. This ensures that if a memory management exception 
occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.10 insert Entry into Quadword Queue 


Format: 


CALL_PAL INSQUEQ | -!PALcode format 


Operation: | 


! R16 contains the address of the predecessor entry 

! or the address of the address of the 

! predecessor entry for INSQUEQ/D 

! R17 contains the address of the new entry 

! RO receives status: 

! QO if the queue was not empty before adding this entry 
! 1 if the queue was empty before adding this entry 

‘ } , 

! 
! 


Must have write access to header and queue entries 
Header and entries must be octaword aligned 


IF opcode EQ INSQUEQ/D THEN 
IF {r16<3:0> NE 0} THEN 


BEGIN 
{illegal operand exception} 
END , 
tmp2 «- (R16) ! Address of predecessor 
ELSE ms 
tmp2 «+ R16 
END 
IF {tmp2<3:0> NE 0} OR {R17<3:0> NE 0} THEN 
BEGIN 
{illegal operand exception} 
END | 
IF {all memory accesses can be completed} THEN 
BEGIN 
tmp + (tmp2) !{ Get forward link of entry 
IF {tmp<3:0> NE 0} THEN 
BEGIN ! Check alignment 
{illegal operand exception} 
END 


(R17) «+ tmp Set forward link of entry 


Backward link of successor 
Forward link of predecessor 


(tmp + 8) «- R17 
(tmp2) «- R17 
IF tmp EQ tmp2 THEN 


t 
(R17 + 8) -— tmp2 ! Backward link of entry 
! 


RO <— 1 
ELSE 
RO <— 0 
END 
ELSE 
BEGIN 
{initiate fault} 
END 
END 
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Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Translation Not Valid 
INegal Operand 


Instruction Mnemonics: 


CALL_PAL INSQUEQ eInsert Entry into Quadword Queue 
CALL_PAL INSQUEQ/D Insert Entry into Quadword Queue Deferred 


Description: 


| INSQUEQ inserts the entry specified in R17 into the absolute queue following the 
entry specified by the predecessor addressed by R16. INSQUEQ/D performs the 
same operation on the entry specified by the contents of the quadword addressed by 
R16. 


In either case, if the entry inserted was the first one in the queue, a 1 is returned 
in RO; otherwise a 0 is returned in RO. The insertion is a non-interruptible 
operation. Before performing any part of the insertion, the processor validates that 
the entire operation can be completed. This ensures that if a memory management 
exception occurs, the queue is left in a consistent state (see Chapters 3 and 6). RO 
is unpredictable if an exception occurs. The relative order of reporting memory 
management and illegal operand exceptions is unpredictable. 
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2.3.11 Remove Entry from Longword Queue at Head Interlocked 


Format: 


CALL_PAL REMQHIL 7 !PALcode format 


Operation: 


R16 contains the address of the queue header 
RO receives status: | 
-1 if the secondary interlock was set 
0 if the queue was empty 
1 if entry removed and queue still not empty 
2 if entry removed and queue empty 
Rl receives the address of the removed entry 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 


Check header alignment and 
that the header is a valid 32 bit address 
IF {R16<2:0> NE 0} OR {SEXT(R16<31:0>) NE R16} THEN 


CT ee ee ee ee ee ee ee ee he ae tee) 


BEGIN | 
{illegal operand exception} 
END 
N <- {retry amount} ! Implementation-specific 
REPEAT | 
LOAD LOCKED (tmp0 + (R16)) ! Acquire hardware interlock. 
IF tmp0Q<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «+ <1, {return} ! Already set 


done + STORE _CONDITIONAL ((R16) «+ {TMPO OR Rl} ) 
Ne N-il 

UNTIL {done EO 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 

tmpl «+ SEXT (tmp0<31:0>) 

IF tmp1<2:0> NE QO THEN ! Check Alignment | 
BEGIN ! Release secondary. interlock 


(R16) «— tmp0 
{illegal operand exception} 
END 


! Check if the following can be done without 
! causing a memory management exception: 
! read contents of header + tmpl {if tmpl NE 0} 
! write into header + tmpl + (header +.tmpl1) {if tmpl NE 0} 
IF {all memory accesses can NOT be completed} THEN 
BEGIN , ! Rélease secondary interlock 
(R16) + tmp0d | 
{initiate memory management fault} 
END 
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“ee 


tmp2 «+ SEXT({R16 + tmp1}<31:0>) 
IF {tmpl EQL 0} THEN 

tmp3 «- R16 
ELSE 


tmp3 «+ SEXT({tmp2 + SEXT ( (tmp2) <31:0>) }) 


IF tmp3<2:0> NE 0 THEN 


BEGIN ! 


(R16) « tmp0 
{illegal operand exception} 
END 


(tmp3 + 4)<31:0> «+ R16 - tmp3 ! 
MB 
(R16) <31:0> «- tmp3 - R16 ! 


IF tmpl EQ 0 THEN 


RO + 0O ! 
ELSE 
BEGIN 
IF {tmp3 - R16} EQ O THEN 
RO + 2 ! 
ELSE 
RO + 1 ! 
END | 
END 


wm 


Rl « tmp2 


Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
legal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL REMQHIL Remove from Longword Queue at Head Interlocked 
Description: 


If the secondary interlock is clear, REMQHIL removes from the self-relative queue 
the entry following the header, pointed to by R16, and the address of the removed 


entry is returned in R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, 
a 0 is returned in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal and the queue is empty after the removal, a 2 is returned 


Check Alignment 
Release secondary interlock 


Backward link of successor 
Forward link of header 
Release lock 


Queue was empty 


Queue now empty 


Queue not empty 


Address of removed entry 
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in RO. If the instruction fails to acquire the secondary interlock after "N" retry 
- attempts, then (in the absence of exceptions) R< 0> is set to a-1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals 

-at the head or tail of the same queue by another process, in a multiprocessor 

environment. The removal is a non-interruptible operation. Before performing 

- any part of the removal, the processor validates that the entire operation can be 

- completed. This ensures that if a memory management exception occurs, the queue 
_is left in a consistent state (see Chapters 3 and 6. : 
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2.3.12 Remove Entry from Longword Queue at Head Interlocked Resident 


Format: 


CALL_PAL REMQHILR !PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! QO if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 

' 
t 
! 
! 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
All parts of the Queve must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT 
LOAD_LOCKED (tmp0O +- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO «+ -1, {return} ! Already set 


done + STORE CONDITIONAL ((R16) <— {TMPO OR R11} ) 
N+ Ne-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmpl «+ SEXT(tmp0<31:0>) 
tmp2 «<— SEXT({R16 + tmp1}<31:0>) 
IF {tmpl EQL 0} THEN 

tmp3 «+ R16 


ELSE ; 
tmp3 +- SEXT({tmp2 + SEXT((tmp2)<31:0>) }) 
END 
(tmp3 + 4)<31:0> — R16 - tmp3 ! Backward link of successor 
MB © | 
(R16) <31:0> -— tmp3 - R16 ! Forward link of header 


! Release lock 
IF tmpi EQ 0 THEN | 


RO + 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp3 - R16} EQ O THEN 
RO - 2 | ! Queue now empty 
ELSE _ | 3 
RO -— 1 ! Queue not empty 
END 7 
END 
Rl «+ tmp2  ! Address of removed entry 
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Exceptions: 


INegal Operand 


Instruction Mnemonics: 


CALL_PAL REMQHILR Remove Entry from Longword Queue 
at Head Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQHILR removes from the self-relative queue 
the entry following the header, ee to by R16, and the address of the removed 
entry is returned i in Rl. 


If the queue was empty prior to this instruction and secondary interlock succeeded, 

a Q is returned in RO. If the interlock succeeded and the queue was not empty at 

the start of the removal and the queue is empty after the removal, a 2 is returned 

in RO. If the instruction fails to acquire the secondary interlock after "N" retry 

attempts, then (in the absence of exceptions) R< 0> is set to a—1. The value "N" is 
- implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue 
header and elements are quadword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.13 Remove Entry from Quadword Queue at Head interlocked 


Format: 


CALL_PAL REMQHIQ !PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

! O 1f the queue was empty 

! 1 if entry removed and queue still not empty 
: 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 
! 

{ 

' 

! 

t 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 


Check header alignment 
IF {R16<3:0> NE 0} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry_amount} ! Implementation~specific 
REPEAT 
LOAD LOCKED (tmp0Q +- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO +-— <1, {return} ! Already set 


done +«- STORE CONDITIONAL ((R16) + {TMPO OR R1} ) 
N+ N-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


IF tmp1<3:0> NE 0 THEN ! Check Alignment 
BEGIN 7 ! Release secondary interlock. 
(R16) +- tmpl 
{illegal operand exception} 
END 


! Check if the following can be done without 
‘-! causing a memory management exception: 
! read contents of header + tmpl {if tmpl NE 0} 
! write into header + tmpl + (header + tmp1) {if tmpl NE 0} 
IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(R16) « tmp0 
{initiate memory management fault} 
END 
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tmp2 «- R16 + tmpl 
IF {tmpl EQL 0} THEN | 
tmp3 «- R16. 
ELSE 
tmp3 + eae. + (tmp2) 


IF tmp3<3:0> NE 0 THEN ! Check Alignment 


. BEGIN | f Release secondary iavecioas 
(R16) + enes 
{illegal operand speeseten| 


END 
(tmp3 + 8) «+ R16 - tmp3 ! Backward link of successor 
ae | | | 
(R16) — tmp3 - R16 ! Forward link of header 


! Release lock 
IF tmpl EQ 0 THEN 


RO + 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp3 - Bre) EQ 0 THEN 
RO = 2 ! Queue now empty 
ELSE 
RO -— 1 ! Queue not empty 
END 
END : 
Rl + tmp2 ! Address of removed entry 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Illegal Operand 
Translation Not Valid 


Instruction Mnemonics: 
CALL_PAL REMQHIQ Remove from Quadword Queue at Head Interlocked 


Description: 


If the secondary interlock is clear, REMQHIQ removes from the self-relative queue 
the entry following the header, pointed to by R16, and the eudreee of the removed 
entry is returned in R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, 


a 0 is returned ; in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal, and the queue is empty after the removal a 2 is returned 
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in RO. If the instruction fails to acquire the secondary interlock after "N" retry 
attempts, then (in the absence of exceptions) R< 0> is set to a—1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 7 


The removal is interlocked to prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. Before performing 
any part of the removal, the processor validates that the entire operation can be 
completed. This ensures that if a memory management exception occurs, the queue 
is left in a consistent state (see Chapters 3 and 6). 
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2.3.14 Remove Entry from Quadword Queue at Head Interlocked Resident 
Format: | | 
CALL_PAL REMQHIQR- | {PAL code format 


Operation: 


R16 contains the address of the queue header 
RO receives status: | 
o-1 aif the secondary interlock was set 
0 af the queue was empty 
1 if entry removed and queve still not empty 
2 if entry removed and queue empty 
Rl receives the address of the removed entry 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
All parts of the Queue must be memory resident 


e=—w ¢=8 2? —: o—™ ——- ¢—- ~—- —-? == —-™ 


N <- {retry amount} ! Implementation-specific 
REPEAT : | 
LOAD LOCKED (tmp0 + (R16) ) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN : ! Try to set secondary interlock. 
RO — <1, {return} ! Already set | 


done +- STORE CONDITIONAL ((R16) +«- {TMPO OR R1} ) 
N+ N-l | 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmp2 «+ R16 + tmpl 
IF {tmpl EQL 0} THEN 
tmp3 + R16 


ELSE 
tmp3 +- tmp2 + (tmp2) 
END 
(tmp3 + 8) + R16 - tmp3 ! Backward link of successor 
MB 
(R16) «- tmp3 - R16 © ! Forward link of header 


Release lock 
IF tmpl EQ 0 THEN 


RO - QO ! Queue was empty. 
ELSE | | 
IF {tmp3 - R16} EQ 0 THEN | 
RO + 2 | ! Queue now empty 
ELSE 
RO + 1 ; ' Queue not empty 
END . 
Rl + tmp2 | ! Address of removed entry 
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Exceptions: 


Illegal Operand 


Instruction Mnemonics: 


CALL_PAL REMQHIQR Remove Entry from Quadword Queue 
- at Head Interlocked Resident 


Description: | 


If the secondary interlock is clear, REMQHIQR removes from the self-relative queue 
the entry following the header, pointed to by R16, and the address of the removed 
entry is returned in R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, 
a 0 is returned in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal, and the queue is empty after the removal a 2 is returned 
in RO. If the instruction fails to acquire the secondary interlock after "N" retry 
attempts, then (in the absence of exceptions) R< 0> is set to a—1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue 
header and elements are octaword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.15 Remove Entry from Longword Queue at Tail interlocked 
Format: 
~CALL_PAL REMQTIL - !PALcode format 


Operation: 


! R16 contains ‘he address of Ene queue header 

! RO receives status: 

! -l1 af the secondary interlock was set 

! O if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 
t : 
' 

t 

t 

' 

t 


! Must have write access to header and queue entries 
Header and entries must be quadword aligned. 


Check header alignment and 
that the header is a valid 32 bit address 
IF {R16<2:0> NE 0} OR {SEXT(R16<31:0>) NE R16} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry _ amount } ! Implementation-specific 
REPEAT | 
LOAD LOCKED (tmp0Q +- (R16) ) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO —- -1, {return} ! Already set 


done + STORE CONDITIONAL ((R16) <— {TMPO OR R11} ) 
N+ N-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO +— -1, {return} ! Retry exceeded 


MB 


tmp1l + SEXT (tmp0<31:0>) 
tmp5 + SEXT (tmp0<63:32>) 
IF tmp5<2:0> NE 0 THEN ! Check alignment 
BEGIN ! Release secondary interlock 
(R16) «- tmp0 
{illegal operand exception} 
END 


'Check if the following can be done without 
{ causing a memory management exception: 
! read contents of header + (header + 4) {if tmpl NE 0} 
! write into header + (header + 4) 
! + (header + 4 + (header + 4)){if tmpl NE 0} 
IF {all memory accesses can NOT be completed} THEN 
BEGIN ! Release secondary interlock 
(Ri6) -— tmp0 
{initiate memory management fault} 
END 
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addr + SEXT( {R16 + tmp5}<31:0> ) 
tmp2 «< SEXT( {addr + SEXT( (addr+4) <31:0>) }<31:0> ) 
IF tmp2<2:0> NE 0 THEN ! Check alignment 
BEGIN ! Release secondary interlock 
(R16) «- tmp0 | 
{illegal operand exception} 


END 
(R16 + 4)<31:0> - tmp2 - R16 ##! Backward link of header 
IF {tmp2 EQL R16} THEN 
(R16)<31:0> «+ 0O ' Forward link, release lock 
ELSE 
BEGIN 
(tmp2)<31:0> «-- R16 —- tmp2 ! Forward link of predecessor 
MB 
(R16)<31:0> «< tmpl ! Release lock 
END 
IF tmpl EQ O THEN 
RO «- 0 ! Queue was empty 
ELSE 
BEGIN 
IF {tmp2 - R16} EQ 0 THEN 
RO +- 2 ! Queue now empty 
ELSE 
RO + 1 ! Queue not empty 
END : 
Rl + addr ! Address of removed entry 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Illegal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL REMQTIL Remove from Longword Queue at Tail Interlocked 


Description: 


If the secondary interlock is clear, REMQTIL removes from the self-relative queue 
the entry preceding the header, pointed to by R16, and the address of the removed 
entry is returned in R1. | 


If the queue was empty prior to this instruction and secondary interlock succeeded, 
a 0 is returned in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal, and the queue is empty after the removal a 2 is returned 
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in RO. If the instruction fails to acquire the secondary interlock after "N” retry 
attempts, then (in the absence of exceptions) R< 0> is set to a—1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. Before performing 
any part of the removal, the processor validates that the entire operation can be 
completed. This ensures that if a memory management exception occurs, the queue 
is left in a consistent state (see Chapters 3 and 6). 
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2.3.16 Remove Entry from Longword Queue at Tail Interlocked Resident 


Format: 


CALL_PAL REMQTILR !PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: 

! -1 if the secondary interlock was set 

t 0 if the queue was empty 

! 1 if entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 

; 
t 
! 
! 


Must have write access to header and queue entries 
Header and entries must be quadword aligned. 
All parts of the Queue must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT | 
LOAD _LOCKED (tmp0 +- (R16)) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO -— “1, {return} ! Already set 


done « STORE CONDITIONAL ((R16) + {TMPO OR R1} ) 
N+ No-l , 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmpl «+ SEXT (tmp0<31:0>) 

tmp5 + SEXT (tmp0<63:32>) 

addr «+ SEXT( {R16 + tmp5}<31:0> ) 

tmp2 «< SEXT( {addr + SEXT( (addr+4)<31:0>) }<31:0> ) 


(R16 + 4)<31:0> + tmp2 - R16 ! Backward link of header 
IF {tmp2 EQL R16} THEN 
(R16) <31:0> «- 0O ! Forward link, release lock 
ELSE 
BEGIN 
(tmp2)<31:0> «- R16 - tmp2 : ! Forward link of predecessor 
MB 
(R16)<31:0> — tmpl ! Release lock 
END 
IF tmpl EQ O THEN 
RO « 0O ! Queue was empty 
ELSE 
IF {tmp2 - R16} EQ 0 THEN 
RO + 2 ! Queue now empty 
ELSE 
RO 2 ! Queue not empty 
END 
END 
Rl + addr ! Address of removed entry 
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7 Exceptions: 
Illegal Operand 
| instruction Mnemonics: 


CALL_ PAL _REM@QTILR Remove Entry from Longword Queue 
at Tail Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQTILR removes from the self-relative queue 
the entry preceding the header, pointed to by R16, and the address of the removed 
entry is returned in R1. 


If the queue was empty prior to this instruction and secondary interlock succeeded, 
a 0 is returned in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal, and the queue is empty after the removal a 2 is returned 
in RO. If the instruction fails to acquire the secondary interlock after "N" retry 
attempts, then (in the absence of exceptions) R< 0> is set to a-—1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked t6 prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue 
header and elements are quadword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.17 Remove Entry from Quadword Queue at Tail interlocked 


Format: 


CALL_PAL REMQTIQ {!PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: : 

! -1 if the secondary interlock was set 

! 0 if the queue was empty 

! 1 aif entry removed and queue still not empty 
! 2 if entry removed and queue empty 

! Rl receives the address of the removed entry 

' 
! 
! 
t 
{ 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 


Check header alignment 
IF {R16<3:0> NE 0} THEN 


BEGIN 
{illegal operand exception} 
END 
N <- {retry amount } ! Implementation-specific 
REPEAT 
LOAD LOCKED (tmpOQ +- (R16) ) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN i Try to set secondary interlock. 
RO + -1, {return} ! Already set 


done + STORE CONDITIONAL ((R16) «<— {TMPO OR R1} ) 
N+ N-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 
tmp5 + (R16+8) 
IF tmp5<3:0> NE 0 THEN ! Check Alignment 
BEGIN ! Release secondary interlock 


(R16) «< tmpl 
{illegal operand exception} 
END 
! Check if the following can be done without 
! causing a memory management exception: 
! read contents of header + (header + 8) {if tmpl1 NE 0} 
! write into header + (header + 8). 
! + (header + 8 + (header + 8)){if tmpl NE 0} 
IF {all memory accesses can NOT be completed} THEN 
BEGIN : ! Release secondary interlock 
(R16) «— tmpl : 
{initiate memory management fault} 
END | 
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addr + R16 + tmp5 | 
tmp2 +«- addr + (addr + 8) 
IF tmp2<3:0> NE 0 THEN ! Check alignment 
BEGIN ! Release secondary interlock 
(R16) «-— tmpl 
{illegal operand exception} 


END | 
(R16 + 8) — tmp2 - R16 ! Backward link of header 
IF {tmp2 EQL R16} THEN ee 
| (R16) + QO ! Forward link, release lock 
ELSE 
BEGIN 
(tmp2) + R16 - tmp2 i! Forward link of predecessor 
MB 
(R16) «+ tmpl ! Release lock 
END | 
END 
IF tmpl EQ 0 THEN 
RO «- 0O | ! Queue was empty 
ELSE 
BEGIN 
IF {tmp2 - R16} EQ O THEN 
RO + 2 ! Queue now empty 
ELSE 
RO -— 1 f Queue not empty 
END 
END 
Rl + addr | ! Address of removed entry 
Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 

Illegal Operand 
Translation Not Valid 


instruction Mnemonics: 


CALL_PAL REMQTIQ Remove from Quadword Queue at Tail Interlocked 
Description: 


If the secondary interlock is clear, REMQTIQ removes from the self-relative queue 
the entry preceding the header, pointed to by R16, and the address of the removed 
entry is returned in R1. 
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If the queue was empty prior to this instruction and secondary interlock succeeded, 
a 0 is returned in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal, and the queue is empty after the removal a 2 is returned 
in RO. If the instruction fails to acquire the secondary interlock after "N" retry 
attempts, then (in the absence of exceptions) R< 0> is set to a —1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. Before performing 
any part of the removal, the processor validates that the entire operation can be 
completed. This ensures that if a memory management exception occurs, the queue 
is left in a consistent state (see Chapters 3 and 6). 
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2.3.18 Remove Entry from Quadword Queue at Tail Interlocked Resident 


Format: | 


CALL_PAL REMQTIQR !PALcode format 


Operation: 


! R16 contains the address of the queue header 

! RO receives status: , 

: -1 if the secondary interlock was set 

! O if the queue was empty 

! 1 if entry removed and queue still not empty 
na 2 if entry removed and queue empty 

{! Rl receives the address of the removed entry 

; 
! 
! 
{ 


Must have write access to header and queue entries 
Header and entries must be octaword aligned. 
All parts of the Queue must be memory resident 


N <- {retry amount} ! Implementation-specific 
REPEAT © 
LOAD LOCKED (tmp0Q + (R16) ) ! Acquire hardware interlock. 
IF tmp0<0> EQ 1 THEN ! Try to set secondary interlock. 
RO + --l1, {return} ! Already set 


done +- STORE CONDITIONAL ((R16) +- {TMPO OR Rl} ) 
N+ N-l 

UNTIL {done EQ 1} OR {N EQ 0} 

IF done NEQ 1, RO + -1, {return} ! Retry exceeded 


MB 


tmp5 + (R16+8) 
addr «+ R16 + tmp5 
tmp2 «- addr + (addr + 8) 


(R16 + 8) «-— tmp2 - R16 ! Backward link of header 
IF {tmp2 EQL R16} THEN 
(R16) <« 0O ! Forward link, release lock 
ELSE 
BEGIN 
(tmp2) «+ R16 - tmp2 ! Forward link of predecessor 
MB 
(R16) « tmpl ! Release lock 
END 
END 
IF tmpl EQ 0 THEN 
RO -— 0 ! Queue was empty 
ELSE | 
IF {tmp2 - R16} EQ O THEN 
RO -— 2 ! Queue now empty 
ELSE | 
RO +- 1 f Queve not empty 
END 


Rl = addr 7 i Address of removed entry 





Exceptions: 


Illegal Operand 


Instruction Mnemonics: 


CALL_PAL REMQTIQR Remove Entry from Quadword Queue 
at Tail Interlocked Resident 


Description: 


If the secondary interlock is clear, REMQTIQR removes from the self-relative queue 
the entry preceding the header, = to by R16, and the address of the removed 
entry is returned in Rl. 


If the queue was empty prior to this instruction and secondary interlock succeeded, 
a 0 is returned in RO. If the interlock succeeded and the queue was not empty at 
the start of the removal, and the queue is empty after the removal a 2 is returned 
in RO. If the instruction fails to acquire the secondary interlock after "N" retry 
attempts, then (in the absence of exceptions) R< 0> is set to a—1. The value "N" is 
implementation dependent. \ The selected initial value of N is 20.\ 


The removal is interlocked to prevent concurrent interlocked insertions or removals 
at the head or tail of the same queue by another process, in a multiprocessor 
environment. The removal is a non-interruptible operation. 


This instruction requires that the queue be memory resident and that the queue. 
header and elements are octaword aligned. No alignment or memory management 
checks are made before starting queue modifications to verify these requirements. 
Therefore, should any of these requirements not be met, the queue may be left in 
an unpredictable state and an illegal operand fault may be reported. 
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2.3.19 Remove Entry from Longword Queue 


Format: 


CALL_PAL REMQUEL | !PALcode format 


Operation: 


R16 contains the address of the entry to remove 
or the address of the 32 bit address of the 
entry for REMQUEL/D 

RO receives status: 


! 

! 

! 

! 

! -1 if the 
! 0 if the 
! 1 if the 
! Rl receives the 
| 


! Must have write 


queue was empty 

queue is empty after removing an entry 
queue is not empty after removing an entry 
address of the removed entry 


access to header and queue entries 


IF opcode EQ REMQUEL/D THEN 
Rl «+ SEXT( (R16) <31:0>) 


ELSE 


Rl + SEXT(R16<31:0>) 


IF {all memory accesses can be completed} THEN 


BEGIN 

tmpl + (R1)<31:0> ! Forward Link of Predecessor 
((R1+4)<31:0>)<31:0> <- tmpl 

tmp2 + (R1+4)<31:0> ! Backward Link of Successor 


((R1) <31:0>+4)<31:0> -— tmp2 


RO + 1 


! Queue not empty 


IF {tmpl EQ tmp2} THEN 


RO -— 0 


IF {Rl EQ tmp2} THEN 


RO <= = 1 
END 
ELSE 
BEGIN 


! Queue now empty 


' Queue was empty 


{initiate fault} 


END 
END 


Exceptions: 


Access Violation 
Fault on Read 
Fault on Write 


Translation Not Valid 
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Instruction Mnemonics: 


CALL_PAL REMQUEL Remove Entry from Longword Queue 
CALL_PAL REMQUEL/D Remove Entry from Longword Queue Deferred 


Description: 


REMQUEL removes the entry addressed by R16 from the longword absolute queue. | 
The address of the removed entry is returned in Rl. REMQUEL/D performs the 
same operation on the queue entry addressed by the longword addressed by R16. 


In either case, if there was no entry in the queue to be removed, RO is set to —1. If 
there was an entry to remove and the queue is empty at the end of this instruction, 
RO is set to 0. If there was an entry to remove and the queue is not empty at the 
end of this instruction, RO is set to 1. The removal is a non-interruptible operation. 
Before performing any part of the removal, the processor validates that the entire 
operation can be completed. This ensures that if a memory management exception 
occurs, the queue is left in a consistent state (see Chapters 3 and 6). 
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2.3.20 Remove Entry from Quadword Queue 


Format: 


CALL_PAL REMQUEQ 4 'PALcode format 


Operation: 


! R16 contains the address of the entry to remove 
! or address of address of entry for REMQUEQ/D 
! RO receives status: 
! -1 if the queue was empty 
! O if the queue is empty after removing an entry 
! 1 if the queue is not empty after removing an entry 
! Rl receives the address of the removed entry 
! Must have write access to header and queue entries 
! Header and entries must be octaword aligned 
IF opcode EQ REMQUEQ/D THEN 
IF {r16<3:0> NE 0} THEN 


BEGIN 
{illegal operand exception} 
END 
Rl + £(R16) 
ELSE 
Ril -— R16 
IF {R1<3:0> NE 0} THEN ! Check alignment 
BEGIN 
{illegal operand exception} 
END 
IF {all memory accesses can be completed} THEN 
BEGIN 
tmpl1l +«- (R1) ! Forward link of Predecessor 
IF {tmp1<3:0> NE 0} THEN | 
BEGIN ! Check alignment 
{illegal operand exception} 
END 
tmp2 «- (R1+8) ! Find predecessor 
IF {tmp2<3:0> NE 0} THEN 
BEGIN ! Check alignment 
{illegal operand exception} 
END 
(tmp2) +- tmpl ! Update Forward link of predecessor 
((R1L)+8) << tmp2 
RO + 1 ! Queue not empty 
IF {tmpl EQ tmp2} THEN 
RO «- 0 ! Queue now empty 
IF {Rl EQ tmp2} THEN 
RO «-— -1 ! Queue was empty 
END 
ELSE 
BEGIN 
{initiate fault} 
END 
END 
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Exceptions: 


Access Violation 
Fault on Read 

Fault on Write 
Translation Not Valid 
Illegal Operand 


instruction Mnemonics: 


CALL_PAL REMQUEQ Remove Entry from Quadword Queue 
CALL_PAL REMQUEQ/D Remove Entry from Quadword Queue Deferred 


Description: 


REMQUEQ removes the queue entry addressed by R16 from the quadword absolute 
queue. The address of the removed entry is returned in Rl. REMQUEL/D performs 
the same operation on the queue entry addressed by the quadword addressed by 
R16. 


In either case, if there was no entry in the queue to be removed, RO is set to —1. If 
there was an entry to remove and the queue is empty at the end of this instruction, 
RO is set to 0. If there was an entry to remove and the queue is not empty at the 
end of this instruction, RO is set to 1. The removal is a non-interruptible operation. 
Before performing any part of the removal, the processor validates that the entire 
operation can be completed. This ensures that if a memory management exception 
occurs, the queue is left in a consistent state (see Chapters 3 and 6). RO and Rl 
are unpredictable if an exception occurs. The relative order of reporting memory 
management and illegal operand exceptions is unpredictable. 
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2.4 Unprivileged VAX Compatibility PALcode Instructions 


The Alpha architecture provides the following PALcode instructions for use in 

_translated VAX code. These instructions are not a permanent part of the architecture 
and will not be available in some future implementations. They are provided to help 
customers preserve VAX instruction atomicity assumptions in porting code from VAX 
to Alpha. These calls should be user mode. They must not be used by any code other 
than that generated by the VEST software translator and its supporting runtime 
code (TIE). f 


\ When they are removed from the architecture, it would be good if they trapped in | 
a way that they could be functionally software emulated many years in the future, 
even if the atomicity is not retained in the software emulation. This would allow 
very old translated images to run in 1998 and beyond, but perhaps restricted to a 
single processor and some restriction around AST delivery. 


They may be removed and not emulated after the first two full generations of Alpha 
implementations, that is, about 1995. \ 
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2.4.1 Atomic Move Operation 


Format: 
AMOVRR !PALcode format 
AMOVRM !PALcode format 
Operation: 
f R16 contains the first source 
! R17 contains the first destination address 
! R18 contains the first length 
! R19 contains the second source 
! R20 contains the second destination address 
! R21 contains the second length 
CASE 
AMOVRR: 
IF intr flag EQ 0 THEN 
R18 <- 0 
{return} 
END 
intr flag «- 0 
(R17) «- R16 ! length specified by R18<1:0> 
(R20) -— R19 ! length specified by R21<1:0> 
IF {both moves successful} THEN 
R18 + 1 
ELSE 
R18 «+ 0 
END 
AMOVRM: 
IF intr flag EQ 0 THEN 
R18 <«- 0Q 
{return} 
END 
intr flag «+ 0 
(R17) «-— R16 ! length specified by R18<1:0> 
IF R21<5:0> NE O THEN 
BEGIN | 
IF R19<1:0> NE 00 OR R20<1:0> NE OO 
{Illegal operand exception} 
ELSE 
(R20) +«- (R19) ! length specified by R21<5:0> 
END 
IF {both moves successful} THEN 
R18 + 1 
ELSE 
R18 <« 0 
END 
ENDCASE 
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Exceptions: 


AMOVRR: Access Violation 
Fault On Write 
| Translation Not Valid 
AMOVRM: Access Violation 
Fault On Read 
Fault On Write 
‘Illegal Operand 
Translation Not Valid 


Instruction Mnemonics: 


CALL_PAL AMOVRR Atomic Move Register/Register 
CALL_PAL AMOVRM Atomic Move Register/Memory 


Description: 


NOTE 
The CALL_PAL AMOVsxx instructions are only for the 
support of translated VAX code. They will disappear 
from the architecture at some time in the future. They 
must be used only in translated VAX code and its 
support routines (TIE). 


~ CALL_PAL AMOVRR 


The CALL_PAL AMOVRR instruction spat ies two multiprocessor safe register 
stores to arbitrary byte addresses. Either both stores are done or neither store is 
done. R18 is set to one if both stores are done, and zero otherwise. The two source 
registers are R16 and R19. The two destination byte addresses are in R17 and R20. 
The two lengths are specified in R18<1:0> and R21<1:0>. The length encoding is: 
00 - store byte, 01 - store word, 10 - store longword, 11 - store quadword. The low 
1, 2, 4, or 8 bytes of the source register are used, respectively. The unused bytes of 
the source registers are ignored. The unused bits of the length registers (R18<63:2> 
and R21<63:2>) should be zero (SBZ). 


If, upon entry to the PALcode routine, the intr_flag is clear then the instruction 
sets R18 to zero and exits, doing no stores. Otherwise, intr_flag is cleared and the 
PALcode routine proceeds. This is the same per-processor intr_flag used by the RS 
and RC instructions. 


The AMOVRR memory addresses may be unaligned. If either store would result in 
a Translation Not Valid fault, Fault on Write, or Access Violation fault, neither store 
is done and the corresponding fault is taken. If both stores would result in faults, it 
is UNPREDICTABLE which one is taken. 
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NOTE 
A fault does not set R18, since the instruction has not 
been completed. 


If both stores can be completed without faulting, they are both attempted 
using multiprocessor-safe LDQ_L..STQ_C sequences. If all the sequences store 
successfully with no interruption, the PALcode routine completes with R18 set to 
one. Otherwise, the PALcode routine completes with R18 set to zero. In addition, 
R16, R17, R19, R20 and R21 are UNPREDICTABLE upon return from the PALcode 
routine, even if an exception has occurred. | 


If the destinations overlap, the stores must appear be done in the order specified. 


CALL_PAL AMOVRM 

The CALL_PAL AMOVRM instruction specifies one multiprocessor safe register 
store to an arbitrary byte address, plus an atomic memory-to-memory move of 0 
to 63 aligned longwords. Either the store and the move are both done in their 
entirety or neither is done. R18 is set to one if both are done, and zero otherwise. 


The first source register is R16, the first destination address is in R17, and the first 
length is in R18. These three are specified exactly as in AMOVRR. 


The second source address is in R19, the second destination address is in R20, 
and the second length is in R21<5:0>. The length is a longword length, in the 
range 0 to 63 longwords (0 to 252 bytes). The unused bytes of the source register 
R16 are ignored. The unused bits of the length registers registers (R18<63:2> and 
R21<63:6>) should be zero (SBZ). 


If, upon entry to the PALcode routine, the intr_flag is clear then the instruction 
sets R18 to zero and exits, doing no stores. Otherwise, intr_flag is cleared and the 
PALcode routine proceeds. This is the same per-processor intr_flag used by the RS 
and RC instructions. 


The memory address in R17 may be unaligned. 


If the length for the move is zero, no move is done, no memory accesses are made 
via R19 and R20, and no fault checking of these addresses is done. In this case, the 
move is always considered to have succeeded in determining the setting of R18. 


If the length in R21 is non-zero, the two addresses in R19 and R20 must be aligned 
longword addresses, otherwise an Illegal Operand exception is taken. 


If either the store or the move would result in a Translation Not Valid, Fault on Read, 
Fault on Write, or Access Violation fault, neither is done and the corresponding fault 
is taken. If both would result in faults, it is UNPREDICTABLE which one is taken. 


NOTE 
A fault does not set R18, since the instruction has not 
been completed. 


If both the store and the move can be completed without faulting, they are both 
attempted, using multiprocessor-safe LDQ_L..STQ_C sequences for the store. If 
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all the operations store successfully with no interruption, the PALcode routine 
completes with R18 set to one. Otherwise, the PALcode routine completes with 

- R18 set to zero. In addition, R16, R17, R19, R20 and R21 are UNPREDICTABLE 
upon return from the PALcode routine, even if an exception has occurred. 


If the memory fields overlap, the store must appear be done first, followed by the 
move. The ordering of the reads and writes of the move is unspecified. Thus, if the 
move destination overlaps the move source, the move results are UNPREDICTABLE. 


These instructions contain no implicit MB. 


Notes: 


e Typical use of these instructions would be a sequence starting with CALL_PAL 
RS and ending with CALL_PAL AMOVxx, Bxx R18,label. The failure path from 
the conditional branch would eventually go back to the RS instruction. When 
such a sequence succeeds, it has done everything from the RS up to and including 
the CALL_PAL AMOVxx completely with no interrupts or exceptions. 


e The CALL_PAL AMOVxx instruction is typically be followed by a conditional 
branch on R18. If the CALL_PAL AMOV<xzx is likely to succeed, the conditional 
branch should be a FORWARD branch on failure (BEQ R18,forward_label) 
or backward branch on success (BNE R18, backward_label), to match the 
architected branch-prediction rule. 
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2.5 Unprivileged PALcode Thread Instructions 


The PALcode thread instructions provide support for multithread implementations, 
which require that a given thread be able to generate a reproducable unique value in © 
a "timely" fashion. This value can then be used to index into a structure or otherwise 
generate further thread unique data. : | 3 


- The two instructions in Table 2—4 are provided to read and ais a process unique | 
value from the process’s hardware context. 


Table 2—4: _Unprivileged PALcode Thread instructions 
Mnemonic Operation —_ 

READ_UNQ_ Read unique context 

WRITE_UNQ Write unique Context 


The process unique value is stored in the HWPCB at [HWPCB+72] when the process | 
is not active. When the process is active, the process unique value can be cached in 
hardware internal storage or resident in the HWPCB only. 
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2.5.1 Read Unique Context © 
Format: | 
CALL_PAL READ_UNQ | | _ !PALcode format 
Operation: 
IF {internal storage for process unique context} THEN 
RO + {process unique context} | | 


ELSE. 
RO «-— # (HWPCB+72) 


Exceptions: 


None 


instruction Mnemonics: 


CALL_PAL READ_UNQ Read Unique Context 


Description: | 


The READ_UNQ instruction causes the hardware process (thread) unique context 
value to be placed in RO. If this value has not previously been written using a CALL_ 
PAL WRITE_UNQ or stored into the quadword in the HWPCB at [HWPCB+72] 
while the thread was inactive then the result returned in RO is UNPREDICTABLE. 
Implementations can cache this unique context value while the hardware process is 
active. The unique context may be thought of as a "slow register". Typically, this 
value will be used by software to establish a unique context for a given thread of 
execution. 
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2.9.2 Write Unique Context 


Format: 


CALL PAL WRITE_UNQ tPALcode format 


Operation: 
'R16 contains value to be written to the hardware process 
! unique context 


IF {internal aeoraue for process unique context} THEN 
| {process unique context} «+ R16 | 
ELSE 3 
(HWPCB+72) +- R16 


Exceptions: 


None 


instruction Mnemonics: 


CALL_PAL WRITELUNQ _ Write Unique Context 
Description: 


The WRITE_UNQ instruction causes the value of R16 to be stored in internal 
storage for hardware process (thread) unique context, if implemented, or in the 
HWPCB at [HWPCB+72], if the internal storage is not implemented. When the 
process is context switched, SWPCTX ensures this value is stored in the HWPCB 
at [HWPCB+72]. Implementations can cache this unique context value in internal 

storage while the hardware process is active. The unique context may be thought © 
of as a “slow register". Typically, this value will be used by software to establish a 
unique context for a given thread of execution. 
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2.6 Privileged PALcode instructions 


Privileged instructions can be called in Kernel mode sniye otherwise, a privileged 
‘instruction exception occurs. The following privileged instructions are provided: 


Table 2-5: PALcode Privileged Instructions Summary 


Mnemonic 


CFLUSH | 


DRAINA 
HALT 


LDQP 
MFPR 
MTPR 
STQP 
SWPCTX 


Operation 

Cache flush — 

Drain aborts 

See Common Architecture, Chapter 6 
Halt processor | | 


See Common Architecture, Chapter 6 
Load quadword physical 


Move from processor register 
Move to processor register 


Store quadword physical 


‘Swap privileged context 
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2.6.1 Cache Flush 
Format: 


CALL_PAL CFLUSH _-- $PALeode format 


Operation: 
! R16 contains the Page Frame Number (PFN) 
! of the page to be flushed 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 


{Flush page out of cache (s) } 


Exceptions: 


Privileged Instruction | 


Instruction Mnemonics: 


CALL PAL CFLUSH Cache Flush 


Description: 


The CFLUSH instruction may be used to flush an entire physical page specified by 
the PFN in R16 from any data caches associated with the current processor. All 
processors must implement this instruction. 


On processors which implement a backup power option which maintains only the 
contents of memory in the event of a powerfail, this instruction is used by the 
powerfail interrupt handler to force data written by the handler to the battery backed 
up main memory. After a CFLUSH, the first subsequent load (on the same processor) 
to an arbitrary address in the target page is either fetched from physical memory or 
from the data cache of another processor. 


Note that in some multiprocessor systems, CFLUSH is not sufficient to ensure that 
the data are actually written to memory and not exchanged between processor 
caches. Additional platform-specific cooperation between the powerfail interrupt 
handlers executing on each processor may be required. 


On systems which implement other backup power options (including none), CFLUSH 
may return without affecting the data cache contents. _ 

To order CFLUSH properly with respect to preceding writes, an MB instruction is 
needed before the CFLUSH; to order CFLUSH properly with respect to subsequent. 
reads, an MB instruction is needed after the CFLUSH. 
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2.6.2 Load Quadword Physical 7 
| | Format: - | os | 
CALL_PAL LDQP IPALcode format 
Operation: - | _ 
Pea S Sout aine eneccuadword aie onda hy stieal. wddeess 


! RO receives the data from memory 


IF PS<CM> NE 0 THEN 
{Privileged Instruction exception} 


RO + (R16) {physical access} 
Exceptions: 
Privileged iaecion 
Instruction linet: 
CALL_PAL LDQP Load Quadword Physical 


- Description: 


The LDQP instruction fetches and writes to RO the quadword aligned memory 
operand, whose physical address is in R16. 


If the operand address in R16 is not quadword aligned, the result is 
UNPREDICTABLE. 
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2.6.3 Move From Processor Register 


Format: 
CALL_PAL MFPR_IPR_Name : {PALcode format 


Operation: 


IF  PS<CM> NE 0 THEN 
{privileged instruction exception} 


! R16 may contain an IPR specific source operand 
{RO «- result of IPR specific function} 


Exceptions: 


Privileged Instruction 


Instruction Mnemonics: 
CALL_PAL MFPR_ xxx Move from Processor Register xxx 
Description: 
The MFPR_xxx instruction reads the internal processor eae specified by the 
PALcode function field and writes it to RO. 


Registers R1, R16, and R17 contain unpredictable results after an MFPR. 
See Chapter 5 for a description of each IPR. 
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2.6.4 Move to Processor Register 


_ Format: 
CALL_PAL MTPRIPR.Name —_—_!PALcode format 


Operation: 


IF PS<CM> NE 0 THEN 
{privileged instruction exception} 
! R16 may contain an IPR specific source operand 


{RO «- result of IPR specific function} 
{IPR + result of IPR specific function} 


Exceptions: 


Privileged Instruction 


Instruction Mnemonics: 


CALL_PAL MTPR_xxx | Move to Processor Register xxx 
Description: 


| The MTPR_xxx instruction writes the IPR-specific source operands in integer 


registers R16 and R17 (R17 reserved for future use) to the internal processor register _ 


specified by the PALcode function field. The effect of loading a processor register is 
guaranteed to be active on the next instruction. 


- Registers R1, R16, and R17 contain unpredictable results after an MTPR. The MTPR 
may return results in RO. If the specific IPR being accessed does not return results 
in RO, then RO contains an unpredictable result after an MTPR. 


See Chapter 5 for a description of each IPR. 





bution 


as 


2.6.5 Store Quadword Physical 
Format: | 
CALL_PAL STQP a IPALcode format 
Operation: 


! R16 contains the. quadword aligned physical address 
! R17 contains the data to be written 


IF PS<CM> NE 0 then 
{Privileged Instruction exception} 


(R16) + R17 {physical access} 
Exceptions: 
Privileged Instruction 
Instruction Mnemonics: 
CALL_PAL STQP Store Quadword Physical 


Description: 


The STQP instruction writes the aoe contents of R17 to the memory location 
whose physical address i is in R16. 


If the operand address in R16 is not gosiwond aligned, the result is 
UNPREDICTABLE. 
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2.6.6 _— Privileged Context | 
Format: 


-CALL_PAL /SWPCTX | {PAL code format 


Operation: 


1 R16 contains the physical address of the new HWPCB. 
! check HWPCB alignment 


IF R16<6:0> NE 0 THEN 
{reserved operand exception} 
IF {PS<CM> NE 0} THEN 
{privileged instruction pecesesenl 


! Store old HWPCB contents 


(IPR_ PCBB + HWPCB _KSP) — SP . 
IF {internal registers for stack pointers} THEN 
BEGIN | : 
(IPR_PCBB + HWPCB_ESP) + IPR_ESP 
(IPR_PCBB + HWPCB SSP) + IPR_SSP 
(IPR_PCBB + HWPCB USP) + IPR_USP 
END 


IF {internal registers for ASTxx} THEN 
BEGIN Z 
(IPR_PCBB + HWPCB ASTSR) + IPR_ASTSR 
(IPR_PCBB + HWPCB_ASTEN) +- IPR_ASTEN 
END 
tmpl «+ PCC 
tmp2 +- ZEXT (tmp1<31:0>) 
tmp3 + ZEXT (tmp1<63:32>) 
(IPR_PCBB + HWPCB PCC) + {tmp2 # tmp3 }<31:0> 
IF {internal storage for process unique value} THEN 
BEGIN 
(IPR_PCBB + HWPCB_UNQ) + process unique value 
END : 


! Load new HWPCB contents 
IPR_PCBB + R16 


IF {ASNs not implemented in virtual instruction cache} THEN 
{flush instruction cache} 


IF {ASNs not implemented in TB} THEN 
IF {IPR_PTBR NE (IPR_PCBB + HWPCB PTBR)} THEN 
{invalidate trans. buffer entries with PTE<ASM> EQ 0} 
ELSE 
IPR_ASN + (IPR_PCBB + HWPCB_ ASN) 
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SP + (IPR_PCBB + HWPCB_KSP) 
IF {internal registers for stack pointers} THEN 
BEGIN 
IPR_ESP + (IPR _PCBB + HWPCB_ ESP) 
IPR_SSP + (IPR_PCBB + HWPCB_SSP) 
IPR_USP + (IPR_PCBB + HWPCB_USP) 
END | 


IPR_PTBR + (IPR_PCBB + HWPCB PTBR) 


IF {internal registers for ASTxx} THEN 
BEGIN | 
IPR_ASTSR + (IPR_PCBB + HWPCB ASTSR) 
IPR_ASTEN «+ (IPR_PCBB + HWPCB_ASTEN) 
END 


IPR FEN «- (IPR_PCBB + HWPCB FEN) 
tmp4 + ZEXT((IPR_PCBB + HWPCB_ PCC)<31:0>) _ 
tmp4 «- tmp4 - tmp2 
PCC<63:32> «-— tmp4<31:0> 
IF {internal Reonage for process unique value} THEN 


BEGIN 
process unique value + (IPR _PCBB + HWECB_UNQ) 
END 
IF {internal storage for Data Alignment trap setting} THEN 
BEGIN 
DAT + (IPR_PCBB + HWPCB_DAT) 
END 
Exceptions: 
Reserved Operand : 
Privileged Instruction 


Instruction Mnemonics: 


CALL PAL SWPCTX Swap Privileged Context 
Description: 


The SWPCTX instruction returns ownership of the current Hardware Privileged 
Context Block (HWPCB) to the operating system and passes ownership of the new | 
HWPCB to the processor. The HWPCB is described in Chapter 4. 


SWPCTX saves the privileged context from the internal processor registers into the 
HWPCB specified by the physical address in the PCBB internal processor register. 
It then loads the privileged context from the new HWPCB specified by the physical 
address in R16. Note that the actual sequence of the save and restore operation is 
not specified so any overlap of the current and new HWPCB en areas produces | 
UNDEFINED results. 


eee PALcode Instruction Descriptions (il) ‘eae | 





- Digital F Restricted Distribution 


The privileged context includes the four stack pointers, the Page Table Base Register 
_(PTBR), the Address Space Number (ASN), the AST enable and summary registers, _ 
the Floating-point enable register (FEN), the Performance monitor (PME) register, _ 
_ the Data alignment trap (DAT) register, and the process cycle counter (PCC). 


_ However, PTBR is never saved in the HWPCB and it is UNPREDICTABLE whether _ 


or not ASN is saved. These values cannot be changed for a running process. The 
_ process integer and floating registers are saved and restored by the operating system. 
See Figure 4-1 for the HWPCB format. | 


Any change to the current HWPCB while the processor has sane results in 
UNDEFINED operation. All the values in the current HWPCB can be read nie | 
IPRs. | 


Tf the HWPCB is read while. ‘sininiliade. resides with the processor, it is 
UNPREDICTABLE whether the original or an updated value of a field is read. The 
processor is free to update an HWPCB field at any time. The decision as to whether 
or not a field is updated is made individually for each field. 


If the enabling conditions are present for an interrupt at the completion of this 
instruction, the interrupt occurs before the next instruction. 


PALcode sets up the PCBB at boot time to point to the HWPCB storage area in the 
Hardware Restart Parameter Block (HWRPB). \ See Platform Section, Chapter 3. 
ar | 

The operation is UNDEFINED if SWPCTX accesses a non-memory region. 


A reference to non-existent memory causes a Machine Check. Unimplemented 
physical address bits are SBZ. The operation is UNDEFINED if any of these bits 
are set. 


3 | NOTE 
_ Processors may keep a copy of each of the per-process 
stack pointers in internal registers. In those processors, 
SWPCTX stores the internal registers into the HWPCB. | 
Processors that do not keep a copy of the stack pointers 
in internal registers, keep only the stack pointer for 
_ the current access mode in SP and switch this with 
the HWPCB contents whenever the current access s mode 
_ changes. 
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2.7 \REVISION HISTORY 
Revision 5.0, May 12, 1992 


OU a 


e FEF FP EF SS — 
ao a fF © ND HF SO 


Changed attempt to acquire secondary lock to retry value 
Modified RSCC and CFLUSH descriptions 

Removed DRAINA to common PAL chapter 

Added ECO #29 GENTRAP 

Added ECO #27 (octaword aligned queues) 

Added secondary interlock information 

Added ECO #31 & #44 (AMOVxx PALcode ‘nuieacuons) 
Added format editing for instructions 

Added resident Queue Instructions ECO #28 


. IMB and HALT moved to Common PALcode Section 

. Removed priv inst tests from RSCC (an unpriv instruction) 
. Clean up the format for instructions | 

. Converted to SDML 

. Added ECO #21, #23, #26 

. Identify queue type, for Queue instructions 

. Modified REI pseudocode 

17. 


Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


oon nan kk wo ND 


po 
© 


Put in ECO for PAL Thread Instructions 


Put in eco requiring current stack be writable for REI instruction 


Put in eco requiring REMQUEx/D to return address of removed entry in R1 


Typos 

Correct cross reference to section "Replacement of standard PALcode’ 
Impose uniform usage of CASE pseudocode construct 

Clarify use of R17 and RO or MTPR instruction 

Specify R16 and R17 as integer registers for MTPR instruction 


Replace occurrences of ‘Reserved Operand Exception’ with Tllegal PALcode 
Operand Trap’ 


. Clarify that subsettable sane vileced PAL Instructions can individually either be 


implemented or cause an Illegal Instruction Trap 
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| 1. Change references from ‘interrupt’ to . AST? in SWASTEN description, and to 


2-94 


"Interrupt or AST” in REI description - 


12. Add Privileged Instruction exception | to those experienced by CFLUSH and 


DRAINA. 

13. Correct ieandintodt titles for IN SQHIQ bak INSQTIQ Instructions 
14. Tweak MFPR_IPR operation definition 
15. Add ’Read System Cycle Counter’ PALcode description 
Revision 3.0, March 2, 1990 

Fix Bug in /D version of REMQUEx and | INSQUEx 
_ Add stack fixup to REI | 

Add Memory Barrier to interlocked queues 

Add section on replacement of PALcode 

Add Cflush 

Rework IFLUSH to IMB 

Remove PAST | 

Define which PAL may be subsetted 


D> Oe Se ee 


Revision 2.0, October 4, 1989. 

Remove test and set/clear interlocked 

Add deferred addressing to the absolute queues 
Add drain aborts ON 

Add poll AST (PAST) © 

Remove read/write of inexact exception enable 
Add CC and FEN to SWPCTX 

Rework interlocked queues for LDQ/L and STQ/C 


Se eS eS oS CS 


Revision 1.0, May 23, 1989 
1. First Full Version 


Revision 0.0, March 15, 1989 
1. Initial Version | 
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_ Chapter 3 
OpenVMS Memory Management (Il) 


3.1 Introduction 


Memory management consists of the hardware and software which control the 
allocation and use of physical memory. Typically, in a multiprogramming system, 
several processes may reside in physical memory at the same time; see Chapter 4. 
OpenVMS Alpha uses memory protection and multiple address spaces to ensure that 
one process will not affect either other processes or the operating system. 


To improve further software reliability, four hierarchical access modes provide 
memory access control. They are, from most to least privileged: kernel, executive, 
supervisor, and user. Protection is specified at the individual page level, where a 
page may be inaccessible, read-only, or read/write for each of the four access modes. 
Accessible pages can be restricted to have only data or instruction access. 


A program uses virtual addresses to access its data and instructions. However, before 
these virtual addresses can be used to access memory, they must be translated into 

_ physical addresses. Memory management software maintains tables of mapping 
information (page tables) that keep track of where each virtual page is located in 
physical memory. The processor utilizes this mapping information when it translates 
virtual addresses to physical addresses. 


Therefore, memory management provides both memory protection and memory 
mapping mechanisms. The OpenVMS Alpha memory management architecture is 
designed to meet several goals: 


e Provide a large address space for instructions and data. 


e Allow programs to run on hardware with physical memory smaller is the 
virtual memory used. 


e Provide convenient and efficient sharing of instructions and data. 
e Allow sparse use of a large address space without excessive page table overhead. 
¢ Contribute to software reliability. | 


© Provide independent read and write access protection. 


3.2 Virtual Address Space 
A virtual address is a 64-bit unsigned integer specifying a byte location within the 


virtual address space. Implementations subset the address space supported to one 
of four sizes (43, 47, 51, or 55 bits) as a function of page size. The minimal virtual 
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address size supported is 43 bits. If an implementation supports less than 64- 
bit virtual addresses it must check that all the VA<63:VA_SIZE> bits are equal 
to VA<VA_SIZE-1>. This gives two disjoint ranges for valid virtual addresses. — 
For example, for a 43-bit virtual address space valid virtual addresses ranges 
are 0..3FF FFFF FFFF,,¢ and FFFF FC00 0000 0000;¢..FFFF FFFF FFFF FFFF,g. 
Accesses to virtual addresses outside of the valid virtual address ranges for an 
implementation cause an access violation exception. 


The virtual address space is broken into pages, which are the units of relocation, 

sharing, and protection. The page size ranges from 8K bytes to 64K bytes. 

System software should, therefore, allocate regions with differing protection on 64- 

_ Kbyte virtual address boundaries to ensure image compatibility across all Alpha 
- implementations. 


Memory management provides the mechanism to map the active part of the virtual 
address space to the available physical address space. The operating system controls 
the virtual-to-physical address mapping tables, and saves the inactive parts of the 
virtual address space on external storage media. 


3.2.1 Virtual Address Format 


The processor generates a 64-bit virtual address for each instruction and operand 
in memory. The virtual address consists of three level-number fields, and a byte_ 
within_page field. 


Figure 3-1: Virtual Address Format 


Sext(Level1<Level Size-1>) byte_within_page 


The byte_within_page field can be either 138, 14, 15, or 16 bits depending on a 
particular implementation. Thus, the allowable page sizes are 8K bytes, 16K bytes, 
32K bytes, and 64K bytes. Each level-number field contains 0-n bits, where n is, for 

example, 9 with an 8K-byte page size. The level-number fields are the same size for 
a given implementation. : 





The level number fields are a function of the page size; all page table entries at any 
‘given level do not exceed one page. The PFN field in the PTE is always 32 bits wide. 
Thus, as the page size grows the virtual and physical address size also grows. 


Table 3-1: Virtual Address Options _ 


Page Byte Level Virtual Physical 
Size Offset Size Address Address 
(bytes) —_ (bits) (bits ) (bits) (bits) 


8K 13 10 43 45 
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Table 3—1 (Cont.): Virtual Address Options 


Page Byte Level Virtual _ Physical 
Size Offset Size Address §$ Address 
_ (bytes) (bits) (bits ) (bits) (bits) . 
16 K 14 il 47 46 
32 K 15 12 51 47 
64K 16 13° 55 48 


3.3 Physical Address Space 


Physical addresses are at most 48 bits. A processor may choose to implement a 
smaller physical address space by not implementing some number of high order 
bits. The two most significant implemented physical address bits select a caching 
policy or implementation dependent type of address space. Implementations will use 
these bits as appropriate for their systems. For example, in a workstation with a 30- 
bit physical address space, bit <29> might select between memory and non-memory 
like regions, and bit <28> could enable or disable caching; see Common Architecture, 
Chapter 5. 


3.4 Memory Management Control 


Memory management is always enabled. Implementations must provide an 
environment for PALcode to service exceptions and to initialize and boot the 
processor. For example PALcode might run with I-stream mapping disabled and 
use the privileged CALL_PAL LDQP and STQP instructions to access data stored in 
physical addresses. | 


3.5 Page Table Entries 


The processor uses a quadword Page Table Entry (PTE) to translate virtual addresses 
to physical addresses. A PTE contains hardware and software control information 
and the physical Page Frame Number. 


Figure 3-2: Page Table Entry 






63 32 31 161514131211109 87654321 0 






Reserved 


or 
Software 
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Fields in the page table entry are interpreted as shown in Table 3-2. 


Table 3-2: Page Table Entry 
Bits | Description 
0 Valid (V) 


Indicates the validity of the the PFN field. When V is set the PFN field is valid for 
use by hardware. When V is clear, the PFN field is reserved for use by software. 
The V bit does not affect the validity of PTE<15:1> bits. 


1 - Fault On Read (FOR) 


When set, a Fault On Read exception occurs on an attempt to read any location in 
the page. 


2 Fault On Write (FOW) 


When set, a Fault On Write exception occurs on an attempt to write any location 
in the page. 


3 Fault On Execute (FOE) 


When set, a Fault On Execute exception occurs on an attempt to execute an 
instruction in the page. 


4 Address Space Match (ASM) 


When set, this PTE matches all Address Space Numbers. For a given VA, 
ASM must be set consistently in all processes, otherwise the address mapping 
is UNPREDICTABLE. 
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Table 3-2 (Cont.): Page Table Entry 


Bits 


Description 


6:5 


Granularity hint (GH) 

Software may set these bits to a non-zero value to supply a hint to translation 
buffer implementations that a block of pages can be treated as a single larger 
page: 

1. The block is an aligned group of 8**N pages, where N is the value of PTE<6:5>, 


e.g. a group of 1, 8, 64, or 512 pages starting at a virtual address with page_ 
size + 3*N low-order zeros. 


2. The block is a group of physically contiguous pages that are aligned both 
virtually and physically. Within the block, the low 3*N bits of the PFNs 
describe the identity mapping and the high 32-3*N PEN bits are all equal. 


3. Within the block, all PTEs have the same values for bits <15:0>, i.e. protection, 
fault, granularity, and valid bits. 


Hardware may use this hint to map the entire block with a single TB entry, instead 
of 8, 64, or 512 separate TB entries. 


Note that it is UNPREDICTABLE which PTE values within the block are used if 
the granularity bits are set inconsistently. 


PROGRAMMING NOTE 
A granularity hint might be appropri- 
ate for a large memory structure such 
as a frame buffer or nonpaged pool that 
in fact is mapped into contiguous vir- 
tual pages with identical protection, fault, 
and valid bits. 


Reserved for future use by DIGITAL. 


PROGRAMMING NOTE 
The reserved bit will be used by future 
hardware systems and should not be 
used by software even if PTE<V> is 
clear. 


Kernel Read Enable (KRE) 


This bit enables reads from kernel mode. If this bit is a 0 and a LOAD or 
instruction fetch is attempted while in kernel mode, an Access Violation occurs. 
This bit is valid even when V=0. 


Executive Read Enable (ERE) 


This bit enables reads from executive mode. If this bit is a 0 and a LOAD or 
instruction fetch is attempted while in executive mode, an Access Violation occurs. 
This bit is valid even when V=0. 
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Table 3-2 (Cont.): Page Table Entry 


Bits 


Description 


10 


11 


13 


14 


15 


31:16 
63:32 


Supervisor Read Enable (SRE) 


This bit enables reads from supervisor mode. If this bit is a 0 and a LOAD or 
instruction fetch is attempted while in supervisor mode, an Access Violation occurs. 
This bit is valid even when V=0. 


User Read Enable (URE) 
This bit enables reads from user mode. If this bit is a 0 and a LOAD or instruction 


fetch is attempted while in user mode, an Access Violation occurs. This bit is valid 


even when V=0. 
Kernel Write Enable (KWE) 


This bit enables writes from kernel mode. If this bit is a 0 and a STORE is 
attempted while in kernel mode, an Access Violation occurs. This bit is valid even 
when V=0. 


Executive Write Enable (EWE) 


This bit enables writes from executive mode. If this bit is a 0 and a STORE is 
attempted while in executive mode, an Access Violation occurs. This bit is valid 
even when V=0. 


Supervisor Write Enable (SWE) 


This bit enables writes from supervisor mode. If this bit is a 0 and a STORE is 
attempted. while in supervisor mode, an Access Violation occurs. This bit is valid 
even when V=0. 


User Write Enable (UWE) 


This bit enables writes from user mode. If this bit is a 0 and a STORE is attempted 
while in user mode, an Access Violation occurs. This bit is valid even when V=0. 


NOTE 
If a write enable bit is set and 
the corresponding read enable bit is 
not, the operation of the processor is 
UNDEFINED. 
Reserved for software. 
Page Frame Number (PFN) 


The PFN field always points to a page boundary. If V is set, the PFN is. 
concatenated with the byte_within_page bits of the virtual address to obtain the 


physical address; see Section 3.7. If V is clear, this field may be used by software. 


3.5.1 Changes to Page Table Entries 


The operating system changes PTEs as part of its memory management functions. 
For example, the operating system may set or clear the valid bit, change the PFN 
field as pages are moved to and from external storage media, or modify the s software 
bits. The processor hardware never changes PTEs. 
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Software must guarantee that each PTE is always consistent within itself. Changing 
a PTE one field at a time may give incorrect system operation, e.g., setting PTE<V> 
with one instruction before establishing PTE<PFN> with another. Execution of an 
interrupt service routine between the two instructions could use an address that 
would map using the inconsistent PTE. Software can solve this problem by building 
a complete new PTE in a register and then moving the new PTE to the page table 
using a Store Quadword instruction (STQ). | 


Multiprocessing makes the problem more complicated. Another processor could be 
reading (or even changing) the same PTE that the first processor is changing. Such 
concurrent access must produce consistent results. Software must use some form of 
software synchronization to modify PTEs that are already valid. Once a processor 
has modified a valid PTE, it is possible that other processors in a multiprocessor 
system may have old copies of that PTE in their Translation Buffer. Software must 
inform other processors of changes to PTEs. 


Software may write new values into invalid PTEs using quadword store instructions 
(i.e., STQ). Hardware must ensure that aligned quadword reads and writes are 
atomic operations. The following procedure must be used to change any of the PTE 
bits <15:0> of a shared valid PTE (PTE<0>=1) such that an access that was allowed 
before the change is not allowed after the change. 


1. The PTE<O> is cleared without changing any of the PTE bits <63:32> and <15:1>. 


2. All processors do a TBIS for the VA mapped by the PTE that changed. The VA 
used in the TBIS must assume that the PTE Granularity hint bits are zero. 


3. After all processors have done the TBIS, the new PTE may be written changing 
any or all fields. 


PROGRAMMING NOTE 
The procedure above allows the QUEUE instructions 
that have probed to check that all can complete, to 
service a TB miss. The QUEUE instruction will use the 
PTE even though the V bit is clear, if during its initial 
probe flow the V bit was set. 


3.6 Memory Protection 


Memory protection is the function of validating whether a particular type of access 
is allowed to a specific page from a particular access mode. Access to each page is 
controlled by a protection code that specifies, for each access mode, whether read or 
write references are allowed. 


The processor uses the following to determine whether an intended access is allowed: 
e The virtual address, which is used to index page tables. | 
e The intended access type (read data, write data, or instruction fetch). 


e The current access mode from the Processor Status. 
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If the access is allowed and the address can be mapped (the Page Table Entry 
is valid), the result is the physical address corresponding to the specified virtual 
address. 


For protection checks, the intended access is read for data loads and instruction 
fetch, and write for data stores. 


If an operand is an address operand, then no reference is made to memory. Hence, 
the page need not be accessible nor map to a physical page. _ 


3.6.1 Processor Access Modes 
There are four processor modes: 
e Kernel 
e Executive 
e Supervisor 
° User 


The access mode of a running process is stored in the Current Mode bits of the 
Processor Status (PS); see Section 6.2. 
3.6.2 Protection Code 


Every page in the virtual address space is protected according to its use. A program 
may be prevented from reading or writing portions of its address space. Associated 
with each page is a protection code that describes the accessibility of the page for 
each processor mode. The code allows a choice of read or write protection for each 
processor mode. 


e Each mode’s access can be read/write, read-only, or no-access. 

e Read and write accessibility are specified independently. 

e The protection of each mode can be specified independently. 
The protection code is specified by 8 bits in the PTE; see Table 3—2. 


The OpenVMS Alpha architecture allows a page to be designated as execute only by 
setting the read enable bit for the access mode and by setting the fault on read and 
write bits in the PTE. 


3.6.3 Access Violation Fault | 
An Access Violation fault occurs if an illegal access is attempted, as determined by 


the current processor mode and the page’s protection field. 


3.7 Address Translation 


The page tables can be accessed from physical memory, or (to reduce overhead) | 
through a mapping to a linear region of the virtual address space. All 
implementations must support the virtual access method and are expected to use it 
as the primary access method to enhance performance. 
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The following sections describe both access methods. 


3.7.1 Physical Access for Page Table Entries 


Physical address translation is performed by accessing entries in a three-level page 
table structure. The Page Table Base Register (PITBR) contains the physical Page 
Frame Number of the highest level (Level 1) page table. Bits <levell> of the virtual 
address are used to index into the first level page table to obtain the physical page 
frame number of the base of the second level (Level 2) page table. Bits <level2> of 
the virtual address are used to index into the second level page table to obtain the 
physical page frame number of the base of the third level (Level 3) page table. Bits 
<level3> of the virtual address are used to index the third level page table to obtain 
the physical Page Frame Number (PFN) of the page being referenced. The PFN is 
concatenated with virtual address bits <byte_within_page> to obtain the physical 
address of the location being accessed. 


If part of any page table resides in I/O space, or in nonexistent memory, the operation 
of the processor is UNDEFINED. 


If the first-level or second-level PTE is valid, the protection bits are ignored; the 
protection code in the third-level PTE is used to determine accessibility. If a first- 
level or second-level PTE is invalid, an Access Violation occurs if the PTE<KRE> 
equals zero. An Access Violation on a first-level or second-level PTE implies that all 
lower-level page tables mapped by that PTE do not exist. 


PROGRAMMING NOTE 

This mapping scheme does not require multiple 
contiguous physical pages. There are no length 
registers. With a page size of 8K bytes, 3 pages (24K 
bytes) map 8M bytes of virtual address space; 1026 
pages (approximately 8M bytes) map an 8-Gbyte address 
space; and 1,049,601 pages (approximately 8G bytes) 
map the entire 8T byte 2**43 byte address space. 


The algorithm to generate a physical address from a virtual address follows: 
IF {SEXT(VA<63:VA_SIZE>) NEQ SEXT(VA<VA_SIZE-1>} THEN 
{initiate Access Violation fault} 
! Read Physical 
levell pte «- ({PTBR * page size} + {8 * VA<levell_number>}) 


IF levell_ pte<v> EQ 0 THEN 
IF levell_ pte<KRE> EQ 0 THEN 
{initiate Access Violation fault} 
ELSE 
{initiate Translation Not Valid fault} 


! Read Physical 


level2 pte <— 
({levell_ pte<PFN> * page size} + {8 * VA<level2 number>}) 
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IF level2 pte<v> EQ 0 THEN 


IF level2 pte<KRE> EQ 0 THEN 


{initiate Access Violation fault}. 


ELSE 


{initiate Translation Not Valid fault} 


! Read Physical 
level3 pte + 


({level2 pte<PFN> * page _size} + {8 * VA<level3_number>}) 


IF {{{level3 pte<UWE> 
{{level3 pte<URE> 
{{level3 pte<SWE> 
{{level3 pte<SRE> 
{{level3 pte<EWE> 
{{level3 pte<ERE> 
{{level3 pte<KWE> 
{{level3 pte<KRE> 

THEN 


EQ 
EQ 
EQ 
EQ 
EQ 
EQ 
EQ 
EQ 


AND 
AND 
AND 
AND 
AND 
AND 
AND 
AND 


{write 
{read 
{write 
{read 
{write 
{read 
{write 
{read 


{initiate Access Violation fault} 


ELSE 


IF level3 pte<v> EQ 0 THEN 


{initiate Translation Not Valid 


IF 


{level3 pte<FOW> EQ 1} AND { write 


{initiate Fault On Write fault} 


IF 


{level3 pte<FOR> EQ 1} AND { read 


{initiate Fault On Read fault} 


IF 


{initiate Fault On Execute fault} 


Physical Address + 


access} 
access} 
access} 
access} 
access} 
access} 
access} 
access} 


fault} 


AND 
AND 
AND 
AND 
AND 
AND 
AND 
AND 


{PS<CM> 
{PS<CM> 
{PS<CM> 
{PS<CM> 
{PS<CM> 
{PS<CM> 
{PS<CM> 
{PS<CM> 


access} THEN 


access} THEN 


{level3 pte<FOE> EQ 1} AND { execute access} THEN 


{level3 pte<PFN> * page size} OR VA<byte within page> 


3.7.2 Virtual Access for Page Table Entries 


To reduce the overhead associated with the address translation in a three-level page 
table structure, the page tables are mapped into a linear region of the virtual address 
space. The virtual address of the base of the page table-structure is set on a system 
wide basis and is contained in the VPTB IPR. 


When a native mode DTB or ITB Miss occurs, the TBMISS flows attempt to load 
the level three page table entry using a single virtual mode load instruction. 


The algorithm involving the manipulation of the missing VA is: 


tmp «- left shift (VA, 


tmp + 


{64 - {{lg(PageSize) *4} -9 }} ) 


EQ 
EQ 
EQ 
EQ 
EQ 
EQ 
EQ 
EQ. 


3}} 
3}} 
2}} 
2}} 
1}} 
1}} 
O}} 
O}}} 


right _shift (tmp, {64 - {{lg(PageSize)*4} -9} + lg(PageSize) -3}) 


tmp «- VPTB OR tmp 
tmp<2:0> <— 0 


At this point, tmp contains the VA of the level 3 page table entry. A LDQ from that 
VA will result in the acquistion of the PTE needed to satisfy the initial TBMISS 


condition. 
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However, in the PALcode environment, if a TBMISS occurs during an attempt 
to fetch the level3 PTE, then it is necessary to use the longer sequence of three 
dependent loads described ; in Section 3.7. 


Chapter 5 contains the description of the VPTB IPR used to contain the virtual 
address of the base of the page table structure. 


The mapping of the page tables necessary for the correct function of the algorithm 
is done as follows: 


1. Select a 2(S*ls(page_size/8))+3) hyte-aligned region (an address with 3*lg(page_size 
/8)+3 low order zeros) in the virtual address space. This value will be written 
into the VPTB register. 


2. Create a levell PTE to map the page tables as follows: 


Levell PTE - 0 ! Init all fields to 0 
Levell PTE<63:32> + PFN of Levell Pagetable 
! Set PFN to PFN of levell pagetable 


Levell PTE<8> -—- 1 ! Kernel Read Enable § (KRE) 
Levell PTE<0> — 1 ! Valid bit | 
3. Write the created levell PTE into the Levell page table entry that corresponds 
to the VPTB value. 


‘4, Set all Levell and Level2 Valid PTEs to allow kernel read access. 
5. Write the VPTB register with the selected base value. 


NOTE 
No validity checks need be made on the value stored 
in the VPTB in a running system. Therefore, if the 
VPTB contains an invalid address, the operation is 
UNDEFINED. 


3.8 Translation Buffer 


In order to save actual memory references when repeatedly referencing the 
same pages, hardware implementations include a translation buffer to remember 
successful virtual address translations and page states. 


When the process context is changed, a new value is loaded into the Address 
Space Number (ASN) internal processor register with a Swap Privileged Context 
instruction (CALL_PAL SWPCTX); see Section 2.6 and Chapter 4. This causes 
address translations for pages with PTE<ASM> clear to be invalidated on a processor 
that does not implement address space numbers. Additionally, when the software 
changes any part (except for the Software field) of a valid Page Table Entry, it must 
also move a virtual address within the corresponding page to the Translation Buffer 
Invalidate Single (TBIS) internal processor register with the MTPR instruction; see 
Chapter 5. 
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IMPLEMENTATION NOTE 
Some implementations may invalidate the entire 
Translation Buffer on an MTPR to TBIS. In general, 
implementations may invalidate more than the required 
translations in the TB. 


The entire Translation Buffer can be invalidated by doing a write to Translation 

Buffer Invalidate All register (CALL_PAL MTPR_TBIA), and all ASM=0 entries can 

be invalidated by doing a write to Translation Buffer Invalidate All Process register 
-(CALL_PAL MTPR_TBIAP); see Chapter 5. 


The Translation Buffer must not store invalid PTEs. Therefore, the software is not 
required to invalidate Translation Buffer entries when making changes for PTEs 
that are already invalid. 


The TBCHK internal processor register is available for interrogating the presence 
of a valid translation in the Translation Buffer; see Chapter 5. 


IMPLEMENTATION NOTE 

Hardware implementors should be aware that a single, 
direct mapped TB has a potential problem when a load 
/store instruction and its data map to the same TB 
location. If TB misses are handled in PALcode, there 
could be an endless loop unless the instruction is held 
in an instruction buffer or a translated physical PC is 
maintained by the hardware. 


3.9 Address Space Numbers 


The Alpha architecture allows a processor to optionally implement address space 
numbers (process tags) to reduce the need for invalidation of cached address 
translations for process specific addresses when a context switch occurs. The 
supported ASN range is 0.MAX_ ASN. \ MAX_ASN is provided in the HWRPB 
MAX_ASN field; see Platform Section, Chapter 3 for a detailed description of the 
HWRPB. \ 


NOTE 
If an ASN outside of the range 0..MAX_ASN is 


assigned to a process, the operation of the processor is 
UNDEFINED. 


The address space number for the current process is loaded by software in the 
Address Space Number (ASN) internal processor register with a Swap Privileged 
Context instruction. ASNs are processor specific and the hardware makes no attempt 
to maintain coherency across multiple processors. In a multiprocessor system, 
software is responsible for ensuring the consistency of TB entries for processes that 
might be rescheduled on different processors. 
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\ Systems that support ASNs should have MAX ASN in the range 13..65535. The 
number of ASNs should be determined by the market a system is targeting. \ 


PROGRAMMING NOTE 
System software should not assume that the number 
of ASNs is a power of two. This allows, for example, 
hardware to use ‘N TB tag bits to encode (2**N)—3 ASN 
values, one value for ASM=1 PTEs, and one for invalid. 


There are several possible ways of using ASNs. There 
are several complications in a multiprocessor system. 
Consider the case where a process that executed on 
processor—1 is rescheduled on processor—2. If a page 
is deleted or its protection is changed, the TB in 
processor—1 has stale data. One solution would be to 
send an interprocessor interrupt to all the processors on 
which this process could have run and cause them to 
invalidate the changed PTE. This results in significant 
overhead in a system with several processors. Another 
solution would be to have software invalidate all TB 
entries for a process on a new processor before it can 
begin execution, if the process executed on another 
processor during its previous execution. This ensures 
the deletion of possibly stale TB entries on the new 
processor. A third solution would assign a new ASN 
whenever a process is run on a processor that is not the 
same as the last processor on which it ran. 


3.10 Memory Management Faults 


Five types of faults are associated with memory access and protection: 


Access Control Violation (ACV) 


Taken when the protection field of the third-level PTE that maps the data 
indicates that the intended page reference would be illegal in the specified access 
mode. An Access Control Violation fault is also taken if the KRE bit is zero in 
an invalid first or second level PTE. 


Fault On Read (FOR) 

Occurs when a read is attempted with PTE<FOR> set. 
Fault On Write (FOW) | | 

Occurs when a write is attempted with PTE<FOW> set. 
Fault On Execute (FOE) | 


Occurs when instruction execution is attempted with PTE<FOE> set. 
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e Translation Not Valid (TNV) 


Taken when a read or write reference is attempted through an a PTE ina 
first-, second-, or third-level page table. 


See Chapter 6 for a detailed description of these faults. 


Note that these five faults have distinct vectors in the System Control Block. The 
Access Violation (ACV) fault takes precedence over the faults TNV, FOR, FOW, and 
FOE. The Translation Not Valid (TNV) fault takes precedence over the faults FOR, 
FOW, and FOE. 


The faults FOR and FOW can occur simultaneously in the CALL_PAL queue 
instructions, in which case the order that the exceptions are taken is 
UNPREDICTABLE; see Section 2.1. | 
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3.11 \REVISION HISTORY 
Revision 5.0, May 12, 1992 


1 


oP oS US 


Added spacing to code_examples 

Term level replaces seg in address translation sect 

Added ECO #17, address translation performance enhancements 
Converted to SDML | 

Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


1. 
2. 
3. 


Typos 
Clarify reference to TNV and FOx as mutually exclusive 


Expand on reference to simultaneous occurrence of FOR ana FOW in section 
"Memory Management Faults’ 


Revision 3.0, Mar 2, 1990 


1; 
2. 
3. 


Change ASN to variable size 
Remove Huge pages and add Granularity hint 
Add rule on changing PTEs from valid to invalid 


Revision 2.0, October 4, 1989 


1. 
2. Add note that PTE<6:7> are not to be used by software 

3. 

4, Add implementation dependent use of high order PEN bits to specify caching 


Remove references to buffer space 


Change name of large pages to huge pages. 


policy. 


Revision 1.0, May 23, 1989 


1. 


First review distribution. 
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d Distribution 


Chapter 4 
OpenVMS Process Structure (Il) 


4.1 Process Definition 


A process is the basic entity that is scheduled for execution by the processor. A 
process represents a single thread of execution and consists of an address space and 
both hardware and software context. 


The hardware context of a process is defined by: 

e 31 Integer registers and 31 Floating-point registers 
e Processor Status (PS) 

e Program Counter (PC) 

e 4 stack pointers 

e Asynchronous System Trap Enable and summary registers (ASTEN, ASTSR) 
e Process Page Table Base Register (PTBR) 

e Address Space Number (ASN) 

e Floating Enable Register (FEN) 

e Process Cycle counter (PCC) 

¢ Process Unique value 

e Data Alignment Trap (DAT) 

e Performance Monitoring Enable Register (PME) 


The software context of a process is defined by operating system software and is 
system dependent. 


A process may share the same address space with other processes or have an address . 
space of its own. There is, however, no separate address space for system software, 
and therefore, the operating system must be mapped into the address space of each 
process; see Chapter 3. 


In order for a process to execute, its hardware context must be loaded into the integer 
registers, Floating-point registers, and internal processor registers. While a process 
is executing, its hardware context is continuously updated. When a process is not 
being executed, its hardware context is stored in memory. 


Saving the hardware context of the current process in memory, followed by loading 
the hardware context for a new process, is termed context switching. Context 
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switching occurs as one process after another is scheduled by the operating system 
for execution. 


4.2 Hardware Privileged Process Context 


The hardware context of a process is defined by a privileged part which is context 
switched with the Swap Privileged Context instruction (SWPCTX) (see Section 2.6), 
and a non-privileged part which is context switched by operating system software. 


When a process is not executing, its privileged context is stored in a 128 byte 
naturally aligned memory structure called the Hardware Privileged Context Block 
(EHOWPCB). 


Figure 4-1: Hardware Privileged Context Block 


63 62 61 32 31 1615 8 7 43 1 0 


| AST | AST 
SR | EN 

F 

E 


D 
A 
T 
Process Cycle Counter (PCC) 
Process Unique Value 
PALcode Scratch Area of 6 Quadwords | 


The Hardware Privileged Context Block (HWPCB) for the current process is specified 
by the Privileged Context Block Base register (PCBB); see Chapter 5. 


The Swap Privileged Context instruction (SWPCTX) saves the privileged context of 
the current process into the HWPCB specified by PCBB, loads a new value into 
PCBB, and then loads the privileged context of the new process into the appropriate 
hardware registers. 





‘“HWPCB 







+8 
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+24 
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:+40 


:+48 


:+56 
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+72 
"+80 


The new value loaded into PCBB, as well as the contents of the Privileged Context 
Block, must satisfy certain constraints or an UNDEFINED operation results: 
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1. The physical address loaded into PCBB must be 128 byte aligned and describes 
sixteen contiguous quadwords that are in a memory-like region; see Common 
Architecture, Chapter 5. 


2. The value of PTBR must be the Page Frame Number of an existent page that is 
in a memory-like region. 


It is the responsibility of the operating system to save and load the pa 
part of the hardware context. 


The SWPCTX instruction returns ownership of the current HWPCB to operating 
system software and passes ownership of the new HWPCB from the operating system 
to the processor. Any attempt to write a HWPCB while ownership resides with the 
processor has UNDEFINED results. If the HWPCB is read while ownership resides 
with the processor, it is UNPREDICTABLE whether the original or an updated value 
of a field is read. The processor is free to update an HWPCB field at any time. The 
decision as to whether or not a field is updated is made individually for each field. 


If ASNs are not implemented, the ASN field is not read or written by PALcode. 
The FEN bit reflects the setting of the FEN IPR. 


The DAT bit controls whether data alignment traps that are fixed up in PALcode 
are reported to the operating system. If the bit is clear, the trap is reported. If the 
bit is set, after the fixup, return is to the user. See Section 6.6. 


Setting the PME bit alerts any performance hardware or software in the system to 
monitor the performance of this process. 


The Process Unique value is that value used in support of multithread 
implementations. The value is stored in the HWPCB when the process is not active. 
When the process is active, the value may be cached in hardware internal storage 
or kept in the HWPCB only. 


4.3 Asynchronous System Traps (AST) 


Asynchronous System Traps (ASTs) are a means of notifying a process of events that 
are not synchronized with its execution but which must be dealt with in the context 
of the process with minimum delay. 


Asynchronous System Traps (ASTs) interrupt process execution and are controlled by 
the AST Enable (ASTEN) and AST Summary (ASTSR) internal processor registers; 
see Chapter 5. 


The AST Enable register (ASTEN) contains an enable bit for each of the four 
processor access modes. When the bit corresponding to an access mode is set, 
ASTs for that mode are enabled. The AST enable bit for an access mode may be 
changed by executing a Swap AST Enable instruction (SWASTEN; see Section 2.6), 
or by executing a Move To Processor Register instruction specifying ASTEN (MTPR 
ASTEN; see Chapter 5). 
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The AST Summary Register (ASTSR) contains a pending bit for each of the four 
processor access modes. When the bit corresponding to an access mode is set, an 
AST is pending for that mode. 7 


Kernel mode software may request an AST for a particular access mode by executing 
a Move To Processor Register instruction eee ASTSR (MTPR ASTSR); see 
Chapter 5). 


Hardware or PALcode monitors the state of ASTEN, ASTSR, PS<CM>, and 
PS<IPL>. If PS<IPL> is less than 2, and there is an AST pending and enabled 
for an access mode that is less than or equal to PS<CM> (i.e. an equal or more 
privileged access mode), an AST is initiated at IPL 2. 


ASTs that are pending and enabled for a less privileged access mode are not allowed 
to interrupt execution in a more privileged access mode. 


4.4 Process Context Switching 


Process context switching occurs as one process after another is scheduled for 
execution by operating system software. Context switching requires the hardware 
context of one process to be saved in memory followed by the loading of the hardware 
context for another process into the hardware registers. : 


The privileged hardware context is swapped with the CALL_PAL Swap Privileged 
Context instruction (SWPCTX). Other hardware context must be saved and restored 
by operating system software. 


The sequence in which process context is changed is important since the SWPCTX 
instruction changes the environment in which the context switching software itself 
is executing. Also, although not enforced by hardware, it is advisable to execute 
the actual context switching software in an environment which cannot be context 
switched (i.e. at an IPL high enough that rescheduling cannot occur). 


The SWPCTX instruction is the only method provided for loading certain internal 
processor registers. The SWPCTX instruction always saves the privileged context of 
the old process and loads the privileged context of a new process. Therefore, a valid 
HWPCB must be available to save the privileged context of the old process as well 
as load the privileged context of the new process. \ 


At system initialization, a valid HWPCB is constructed in the Hardware Restart 
Parameter Block (HWRPB) for the primary processor; see Platform Section, Chapter 
3. Thereafter, it is the responsibility of operating system software to ensure a valid 
HWPCB when executing a SWPCTX instruction. \ 
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4.5 \REVISION HISTORY 
Revision 5.0, May 12, 1992 
Corrected PME description, added process unique value description 
Added PME, DAT and process unique value to Process definition 
Added PME bit as per ECO #43 
Corrected DAT bit description as per ECO #40 
Added DAT bit and FEN bit description 
Converted to SDML 
Added ECO #18, #21 
Changed ’CC’ to PCC’ in HWPCB 
Integrate references for Console ECO #15 


2. OS eS Oo Ee Se ON 


Revision 4.0, March 29, 1991 

1. Remove references to ASTs as ‘interrupts’, substituting ‘exception’ where 
appropriate. 

Revision 3.0, Mar 2, 1990 

1. Lower number of PAL scratch words from 23 to 7 

2. Make ASN field be ignored on systems that do not implement ASNs 

3. Change ASTRR to ASTSR 

4. Change alignment of HWPCB 


Revision 2.0, October 4, 1989 
1. Add FEN, CC, and PAL scratch areas to HWPCB 


Revision 1.0, May 23, 1989 
1. First review distribution. 


\ 
15comment>(edited 11-may-92) 
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Chapter 5 
OpenVMS Internal Processor Registers, (Il) 


5.1 Internal Processor Registers 


This chapter describes the OpenVMS Alpha Internal Processor Registers (IPRs). 
These registers are read and written with Move From Processor Register (MFPR) 
and Move To Processor Register (MTPR) instructions; see Section 2.6. 


These instructions accept an input operand in R16 and return a result, if any, in 
RO. Registers R1, R16, and R17 are UNPREDICTABLE after a CALL_PAL MxPR 
routines. If a CALL_PAL MxPR routine does not return a result in RO, then RO is 
also UNPREDICTABLE on return. 


Some IPRs (for example, ASTSR, ASTEN, IPL) may be both read and written in a 
combined operation by performing an MTPR instruction. 


Internal Processor Registers may or may not be implemented as actual hardware 
registers. An implementation may choose any combination of PALcode and hardware 
to produce the architecturally specified function. 


Internal Processor Registers are only accessible from Kernel mode. 


5.2 Stack Pointer Internal Processor Registers 


The stack pointers for User, Supervisor, and Executive stacks are accessible as IPRs 
through the CALL_PAL MTPR and MFPR instructions. An implementation may 
retain some or all of these stack pointers only in the HWPCB. In this case, MTPR and 
MFPR for these registers must access the corresponding PCB locations. However, 
implementations that have these stack pointers in internal hardware registers are 
not required to access the corresponding HWPCB locations for MTPR and MFPR. 
The HWPCB locations get updated when a SWPCTX instruction is executed. 


An implementation may also choose to keep the Kernel Stack Pointer (KSP) in an 
internal hardware register (labelled IPR_KSP); however, this register is not directly 
accessible through MTPR and MIFPR instructions. Because access to the KSP 
requires Kernel mode, the actual KSP is the current mode stack pointer (R30); thus 
access to KSP is provided through R30 and no MTPR or MFPR access is required. 
PALcode routines can directly access IPR_KSP as needed. 


At System Initialization, the value of the KSP is taken from the initial HWPCB (see 
Chapter 4). | 
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9.3 IPR Summary 


Table 5—1: Internal Processor Register (IPR) Summary 


Register Name 


Address Space Number 
AST Enable 

AST Summary Register 
Data Align Trap Fixup 
Floating-point Enable 
Interprocessor Int. Request 


Interrupt Priority Level 


Machine Check Error Summary 


Performance Monitor 


Privileged Context Block Base 


Processor Base Register 
Page Table Base Register 
System Control Block Base 


Software Int. Request Register 


Software Int. Summary Register 


TB Check 

TB Invalid. All 

TB Invalid. All Process 
TB Invalid. Single 

TB Invalid. Single Data 


TB Invalid. Single Instruct. 


Kernel Stack Pointer 
Exec Stack Pointer 
Supervisor Stack Pointer | 
User Stack Painter 
Virtual Page Table Base 
Who-Am-I 


Mnemonic Access! 


IPL 

MCES 
PERFMON 
PCBB 
PRBR 
PTBR 
SCBB 
SIRR 

SISR 
TBCHK 


TBIAP 
TBIS 
TBISD 
TBISI 


ESP 
SSP 
USP 
VPTB 
WHAMI 


lAccess symbols are defined in Table 5-2 
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2 ee he ee 


R 
R/W* 


: 


Input 
R16 


address 
address 
address 


address 
address 
address 


address 


Output 
RO 


number 
mask 


mask 


value 


value 


value 


address 
value 
frame 
frame 


mask 


status 


address 
address 
address 
address 


number 


Context 
Switched 


yes 
yes 
yes 
yes 
yes 
no 
no 
no 
no 
no 
no 
yes 
no 
no 
no 
no 
no 
no 
no 
no 
no 
yes 


yes 


yes 
no 


no 


istribution 


Table 5-2: Internal Processor Register (IPR) Access Summary 


Access 


R 
WwW 
R/W 
w* 
R/W* 


None 


Meaning 


Access by MFPR only. 

Access by MTPR only. 

Access by MFPR or MTPR. 

Read and Write access accomplished by MTPR; see Section 5.1 for details. | 

Access by MFPR or MTPR. Read and Write access accomplished by MTPR; see Section 5.1 for details. 
Not accessible by MTPR or MFPR; accessed by PALcode routines as needed. 
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5—3 


- §.3.1 Address Space Number (ASN) 
Access: 

Read 
Operation: 


IF {ASN are implemented} THEN 
| RO « ZEXT (ASN) 
ELSE 

RO + 0O 


Value at System Initialization: 


Zero 


Format: 


Figure 5—-1: Address Space Number Register (ASN) 


63 | — : 0 
RO OO 
Description: 


Address Space Numbers (ASNs) are used to further qualify Translation Buffer 
references; see Chapter 3. If ASNs are implemented, the current ASN may be read 
by executing an MFPR instruction specifying ASN. : 


_As processes are scheduled for execution, the ASN for the next process to execute 


is loaded using the Swap Privileged Context (SWPCTX) instruction; see Chapters 2 
and 4. 


The ASN register is an implicit operand to the CALL_PAL MFPR_IPR, TBCHK, 


and TBISx PALcode instructions, in which it is used to qualify the virtual address 
supplied in R16. 
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5.3.2 AST Enable (ASTEN) 


Access: 


Read | 
Write* 


Operation: 


RO + ZEXT (ASTEN<3:0>) ! Read (MFPR) 

RO + ZEXT(ASTEN<3:0>) ! Write* (MTPR) 
ASTEN<3:0> «+ {{ASTEN<3:0> AND R16<3:0>} OR R16<7:4>} 
{check for pending ASTs} 


Value at System Initialization: 


Zero 
Format: 


Figure 5-2: AST Enable Register (ASTEN) 


Format of RO 


63 oo | _ 43210 





Description: 


The AST Enable Register records the AST enable state for each of the modes: 
Kernel (KEN), Executive (EEN), Supervisor (SEN) and User (UEN). By writing R16 
appropriately and then executing an MTPR instruction specifying ASTEN, the value 
of ASTEN may be simultaneously read and modified. R16 contains bit masks used 
to determine the new value of ASTEN: 


e Bits R16<0> and R16<4> control the new sists of Kernel enable. 
e Bits R16<1> and R16<5> control the new state of Executive enable. 
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¢ Bits R16<2> and R16<6> control the new state of Supervisor enable. 
° Bits R16<3> and R16<7> control the new state of User enable. 


An MFPR to ASTEN reads the current value of the ASTEN and returns this value 
in RO. 


An MTPR to ASTEN begins by reading the current value of ASTEN and returning 
this value in RO. The current value of ASTEN is then ANDed with bits R16<3:0>; 
these bits preserve (if set to ’1’) or clear (if equal to ’0’) the current state of their 
corresponding enable modes. The value produced by this operation is then ORed 
with bits R16<7:4>; these bits turn on (if set to ’1’) or do not affect (if equal to 
0’) their corresponding enable modes. The resulting value is then written to the 
eee 


NOTE 
All AST enables can be cleared by loading a zero into 
R16 and executing an MTPR instruction specifying 
ASTEN. To enable an AST for a given mode, load R16 
with a mask that has bits <3:0> set and one of the bits 
<7:4> corresponding to the AST mode to be set. Then 
execute an MTPR instruction specifying ASTEN. 


\ ASTEN is not present in the VAX architecture. It was added to the Alpha 
architecture to allow software (especially nonprivileged software) to enable and 
disable ASTs efficiently for the current mode via the SWASTEN instruction. It is 
anticipated that, with multitasking, it will become extremely important to be able 
to enable and disable ASTs in an efficient manner in . shareable runtime support 
routines. \ | 


As processes are scheduled for execution, the state of the AST enables for the 
next process to execute is loaded using the Swap Privileged Context (SWPCTX) 
instruction. The Swap AST Enable (SWASTEN) instruction can be used to change 
the enable state for the current access mode; See Chapters 2 and 4. 
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5.3.3 AST Summary Register (ASTSR) 
Access: 


Read 
Write* 
Operation: 
RO + ZEXT (ASTSR<3:0>) { Read (MFPR) 
RO « ZEXT (ASTSR<3:0>) ! Write* (MTPR) 


ASTSR<3:0> «- {{ASTSR<3:0> AND R16<3:0>} OR R16<7:4>} 
{check for pending ASTs} 


Value at System Initialization: 


Zero 
Format: 


Figure 5-3: AST Summary Register (ASTSR) 


63 | 87654321 0 
U UISIEIK 
O CICICIC 
N LILILIL 
R16 | | | 
63 | 43210 
U K 
Pp P 
D D 
RO 
Description: 


The AST Summary Register records the AST pending state for each of the modes: 
Kernel (KPD), Executive (EPD), Supervisor (SPD), and User (UPD). 


By writing R16 appropriately and then executing an MTPR instruction specifying 
ASTSR, the value of ASTSR may be simultaneously read and modified. R16 contains 
bit masks used to determine the new value of ASTSR: 


e Bits R16<0> and R16<4> control the new state of Kernel pending. 
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© Bits R16<1> and R16<5> control the new state of Executive pending. | 
e Bits R16<2> and R16<6> control the new state of Supervisor pending. | 
¢ Bits R16<8> and R16<7> control the new state of User pending. 

An MFPR reads the current value of ASTSR and returns this value in RO. 


An MTPR to ASTSR begins by reading the current. value of ASTSR and returning 
this value in RO. The current value of ASTSR is then ANDed with bits R16<3:0>; 
these bits preserve (if set to ’1’) or clear (if equal to ’0’) the current state of their 
corresponding pending modes. The value produced by this operation is then ORed 
with bits R16<7:4>; these bits turn on (if set to ’1’) or do not affect (if equal to 
0’) their corresponding pending modes. The resulting value is then written to the 
ASTSR. | 


| NOTE 
All AST requests can be cleared by loading a zero in R16 
and executing an MTPR instruction specifying ASTSR. 
To request an AST for a given mode, load R16 with a 
mask that has bits <3:0> set and one of the bits <7:4> 
corresponding to the AST mode to be set. Then execute 
an MTPR instruction specifying ASTSR. 


As processes are scheduled for execution, the pending AST state for the next process 
to execute is loaded using the Swap Privileged Context (SWPCTX) instruction; see 
Chapters 2 and 4. 


‘When the processor IPL is less than 2, and proper enabling conditions are present, 
an AST interrupt is initiated at IPL 2 and the corresponding access mode bit in 
ASTSR is cleared; see Section 6.7.6. 
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5.3.4 Data Alignment Trap Fixup (DATFX) 
Access: 
Write 


Operation: 


DATFX «- R16<0> 
(HWPCB+56)<63> <— DATFX 


Value at System Initialization: 


Zero 
Format: 


Figure 5-4: Data Alignment Trap Fixup (DATFX) 


63 | | | | 210 
— eiaiaisi‘i‘“cB 
A 
T 
Description: 


Data Alignment traps are fixed up in PALcode and are reported to the operating 
system under the control of the DAT bit. If the bit is zero, the trap is reported. 
For the LDx_L and STx_C instructions, no fixup is possible and an illegal operand 
exception is generated. For the description of the data alignment traps, see 
Section 6.6. 
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5.3.5 Floating Enable (FEN) 


Access: 


Read/Write 
_ Operation: 
RO + ZEXT (FEN) ! Read 
FEN «— R16<0> | ! Write 
(HWPCB+56)<0> < FEN ! Update PCB on Write 


Value at System Initialization: 


Zero 
Format: 


Figure 5—5: Fioating Enable (FEN) Register 


63 | 210 
Description: 


The Floating-point unit can be disabled. If the Floating Enable Register (FEN) is 
zero, all instructions that have floating registers as operands cause a Floating-point 
disabled fault; see Section 6.3.1.1. 7 
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5.3.6 Interprocessor Interrupt Request (IPIR) 


Access: 


Write 


Operation: 


IPIR « R16 


Value at System Initialization: 


Not applicable 
Format: 


Figure 5-6: Interprocessor Interrupt Request Register (IPIR) 


63 0 
R16 | 


Description: 


An interprocessor interrupt can be requested on a specified processor by writing 
that processor’s number into the IPIR register through an MTPR instruction. The 
interrupt request is recorded on the target processor and is initiated when proper 
enabling conditions are present. 


PROGRAMMING NOTE 
The interrupt need not be initiated before the next 
instruction is executed on the requesting processor, even 
if the requesting processor is also the target processor 
for the request. 


For additional information on interprocessor interrupts, see Section 6.4.5.1. 
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5.3.7 Interrupt Priority Level (IPL) 


Access: 


Read/Write* 

Operation: 
RO «+ ZEXT(PS<IPL>) ! Read 
RO «+ ZEXT(PS<IPL>) ! Write* 
PS<IPL> « R16<4:0> ! Write 


{check for pending ASTs or interrupts} 


Value at System Initialization: 


31 
Format: 


Figure 5-7: Interrupt Priority Level (IPL) 


63 | | | 5 4 0 
Description: 


‘An MFPR IPL returns the current interrupt priority level in RO. An MTPR IPL 
returns the current interrupt priority level in RO and sets the interrupt priority 
level to the value in R16. If proper enabling conditions are present, an interrupt or 
AST is initiated prior to issuing the next instruction; see Sections 6.4.1 and 6.7.6. 
R16<63:5> are defined as RAZ/SBZ. Therefore, the presence of non-zero bits upon 
write in R16<63:5> may cause UNDEFINED results. 
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5.3.8 Machine Check Error Summary Register (MCES) 


Access: 
Read/Write 

Operation: 
RO «+ JZEXT(MCES) ! Read 
IF {R16<0> EQ 1} THEN MCES<0> <— 0 ! Write 


IF {R16<1> EQ 1} THEN MCES<1> <+- 0 
IF {R16<2> EQ 1} THEN MCES<2> + 0 
MCES<3> +- R16<3> 
MCES<4> +- R16<4> 


Value at System Initialization: 


Zero 
Format: 


Figure 5-8: Machine Check Error Summary Register (MCES) 


63 32 31 5 


pm — eee 


_ Description: 


The use of the MCES IPR is described in Section 6.5. 


MCES<0> is set by the hardware or PALcode when a processor or system machine 
check occurs. MCES<I1> is set by the hardware or PALcode when a system 
correctable error occurs. MCES<2> is set by the hardware or PALcode when a 
processor correctable error occurs. Writing a 1 to any of these three bits clears that 
bit. 


0 |= 

| OVS Jo 
moO fr 
MOM {= 
AOE jo 


| OND | 


MCES<0> is cleared by the operating system machine check error handler and 
used by the hardware or PALcode to detect double machine checks. MCES<I1> 
and MCES<2> are cleared by the operating system system or processor system 
correctable error handlers; these bits are used to indicate that the associated 
correctable error logout area may be reused by hardware or PALcode. In the event 
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of double correctable errors, PALcode does not overwrite the logout area and does 
not force the processor to enter console I/O mode; see Section 6.5.1. 


MCES<4:3> are used to disable reporting of correctable errors. When set, the error is 
corrected, but no system correctable error interrupt or processor correctable machine 
check is generated. 


Implementation dependent (IMP) bits may be used to report implementation specific 
errors. 
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5.3.9 Performance Monitoring Register (PERFMON) 


Access: 
Write* 
Operation: 


! R<16> contains implementation specific input values 
! R<O> may return implementation specific values 
' Operations and actions taken are implementation specific 


Value at System Initialization: 


Implementation Dependent 
Format: 


Figure 5—9: Performance Monitoring Register (PERFMON) 


63 0 


Description: 


The arguments and actions of this performance monitoring function are platform 
and chip dependent. The functions, when defined for an implementation, are to be 
registered in Appendix E. 


R<16> contains implementation dependent input values. Implementation specific 
values may be returned in R<0>. 
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5.3.10 Privileged Context Block Base (PCBB) 


Access: 


Read 


Operation: 


RO + ZEXT (PCBB) 


Value at System Initialization: 


Address of processor’s bootstrap HWPCB 


Format: 


Figure 5—10: Privileged Context Block Base Register (PCBB) 


63 48 47 0 
RO 


Description: 


The Privileged Context Block Base Register contains the physical address of the 
privileged context block for the current process. It may be read by executing an 
MF PR instruction specifying PCBB. 


PCBB is written by the Swap Privileged Context (SWPCTX) instruction; see 
Chapters 2 and 4. 
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5.3.11 Processor Base Register (PRBR) 


Access: 


Read/Write 


Operation: 
RO + PRBR ! Read 


PRBR «< R16 . | Wrate 


Value at System Initialization: 


UNPREDICTABLE 
Format: 


Figure 5—11: Processor Base Register (PRBR) 


63 0 


Operating System-Dependent Value 


Description: 


In a multiprocessor system, it is desirable for the operating system to be able to 
locate a processor-specific data structure in a simple and straightforward manner. 
The Processor Base Register provides a quadword of operating system-dependent 
state that can be read and written via MFPR and MTPR instructions that specify 
PRBR. 
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5.3.12 Page Table Base Register (PTBR) 


Access: 


Read 


Operation: 


RQ «= PTBR 


Value at System Initialization: 


Value in the bootstrap HWPCB 


Format: 


Figure 5-12: Page Table Base Register (PTBR) 


63 32 31 0 


RAZ Page Frame Number 


RO 


Description: 


The Page Table Base Register contains the page frame number of the first-level page 
table for the current process. It may be read by executing an MFPR instruction 
specifying PTBR; see Chapter 3. 


As processes are scheduled for execution, the PTBR for the next process to execute 
is loaded using the Swap Privileged Context (SWPCTX) instruction; see Chapters 2 
and 4. 
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5.3.13 System Control Block Base (SCBB) 


Access: 
Read/Write 

Operation: 
RO =< ZEXT (SCBB) ! Read 
SCBB <— R16 ! Write 


Value at System Initialization: 


UNPREDICTABLE 
Format: 


Figure 5—13: System Control Block Base Register (SCBB) 


63 32 31 . ) 


IGN/RAZ Page Frame Number 


Description: 


The System Control Block Base Register holds the Page Frame Number (PFN) of 
the System Control Block, which is used to dispatch exceptions and interrupts, and 
may be read and written by executing MFPR and MTPR instructions that specify 
SCBB; see Section 6.6. 


When SCBB is written, the specified physical address must be the PFN of a page 
which is neither in I/O space nor non-existent memory, or UNDEFINED operation 
will result. | 
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5.3.14 Software renee Request Register (SIRR) 
Access: 
Write 
Operation: 
IF R16<3:0> NE 0 THEN 


SISR<R16<3:0>> -— 1 


Value at System Initialization: 


Not applicable 
Format: 


Figure 5-14: Software interrupt Request Register (SIRR) 


63 . . | | 43 0 
a . : = , | : , 
Description: 


A software interrupt may be requested for a particular Interrupt Priority Level 
(IPL) by executing an MTPR instruction specifying SIRR. Software interrupts may 
be requested at levels 0 through 15 (requests at level 0 are ignored). 


An MTPR SIRR sets the bit corresponding to the specified interrupt level in the 
Software Interrupt Summary Register (SISR). — 


If proper enabling conditions are present, a software interrupt is initiated prior to 
issuing the next instruction; see Sections 6.4.1 and 6.7.6. 
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5.3.15 Software Interrupt Summary Register (SISR) 
Access: 
Read 


Operation: 


RO +- ZEXT(SISR<15:0>) 


Value at System Initialization: 


Zero 


| Format: 


Figure 5-15: Software Interrupt Summary Register (SISR) 


63 | | se  -161514181211109 8765438 
RO 
Description: 


The Software Interrupt Summary Register records the interrupt pending state for 
each of the interrupt levels 1 through 15. The current interrupt pending state may 
be read by executing an MFPR instruction specifying SISR. 


MTPR SIRR (see SIRR) requests an interrupt at a particular interrupt level and 
sets the corresponding pending bit in SISR. 


When the processor IPL falls below the level of a pending request, an interrupt is 
initiated and the corresponding bit in SISR is cleared; see Sections 6.4.1 and 6.7.6. 
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5.3.16 Translation Buffer Check (TBCHK) 
Access: | 
Read 


Operation: | 


RO «+ 0 

IF {implemented} THEN 

~RO<O> +«- {entry in TB for VA in R16} 

ELSE | 
RO<63> «+ 1 


Value at System Initialization: 


Correct results are always returned 
Format: 


Figure 5-16: Translation Buffer Check Register (TBCHK) 


63 0 


R16 | | | 

63 62 ee | | 210 
= . | 

Description: 


The Translation Buffer Check Register provides the capability to determine if 
a virtual address is present in the Translation Buffer by executing an MFPR 
instruction specifying TBCHK; see Chapter 3. | 


The virtual address to be checked is specified in R16 and may be any address within 
the desired page. If ASNs are implemented, only those Translation Buffer entries 
which are associated with the current value of the ASN IPR will be checked for the 
virtual address. The value read contains an indication of whether the function is 
implemented and whether the virtual address is present in the Translation Buffer. 
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If the function is not implemented, a value is returned with bit <63> set and bit <0> 
clear. Otherwise, a value is returned with bit <63> clear, and with bit <0> indicating 
whether the virtual address is present in (1) or absent from (0) the Translation 
Buffer. | 7 


The TBCHK Register can be used by system software for working set management. 
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5.3.1 7 Translation Buffer invalidate All (TBIA) 
Access: _ 7 
Write | 
Operation: 
{Invalidate all TB entries) 


Value at System Initialization: 


Not applicable 
Format: 


Figure 5-17: Translation Buffer invalidate All Register (TBIA) 


63 ; | | 0 
R16 -_ 
Description: 


The Translation Buffer Invalidate All Register provides the capability to invalidate 
all entries in the Translation Buffer by executing an MTPR instruction specifying 
TBIA; see Chapter 3. | | 
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5.3.18 Translation Buffer Invalidate All Process (TBIAP) 
Access: 
Write — 
Operation: 
{Invalidate all TB entries with PTE<ASM> clear} 


Value at System Initialization: 


Not applicable 
Format: 


Figure 5-18: Translation Buffer invalidate All Process Register (TBIAP) 


63 0 
R16 


Description: 


The Translation Buffer Invalidate All Process Register provides the capability to 
invalidate all entries in the Translation Buffer that do not have the ASM bit set by 
executing an MTPR instruction specifying TBIAP; see Chapter 3. | 


Notes: | 
More entries may be invalidated by this operation. For example some 
implementations may flush the entire TB on a TBIAP. 7 
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9.3.19 Translation Buffer Invalidate Single (TBISx) 


Access: 
Write 
Operation: 


TBIS: 
{Invalidate single Data TB entry using R16} 
{Invalidate single Instruction TB entry using R16} 

TBISD: . | | 
{Invalidate single Data TB entry using R16} 

TBISI: | 
{Invalidate single Instruction TB entry using R16} 


Value at System Initialization: 


Not applicable 


Format: 


Figure 5-19: Translation Buffer invalidate Single (TBIS) 





R16 


Description: 


The Translation Buffer Invalidate Single Registers provide the capability to 
invalidate a single entry in the Instruction Translation Buffer (TBISD, the Data 
Translation Buffer (TBISD), or both translation buffers (TBIS). The virtual address 
to be invalidated is passed in R16 and may be any address within the desired page. 


Notes: 

More than the single entry may be invalidated by this operation. For example 
some implementations may flush the entire TB on a TBIS. As a result, if the 
specified address does not match any entry in the Translation Buffer, then it is 
implementation-dependent whether the state of the Translation Buffer is affected 
by the operation. SS | 7 
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5.3.20 Executive Stack Pointer (ESP) 


_ Access: 


Read/Write 


Operation: 


IF {internal registers for stack pointers} THEN ! Read 
RO +- ESP 

ELSE 
RO + (IPR_PCBB + HWPCB_ ESP) 


IF {internal registers for stack pointers} THEN ! Write 
ESP «— R16 | 

ELSE Moves 
(IPR_PCBB + HWPCB ESP) « R16 


Value at System Initialization: 


Value in the initial HWPCB 
Format: 


Figure 5-20: Executive Stack Pointer (ESP) 


63 | 0 
Description: 


This register allows the stack pointer for Executive mode (ESP) to be read and 
written via MFPR and MTPR instructions that specify ESP. 


The current stack pointer may be read and written directly by specifying scalar 
register SP (R30). 


As processes are scheduled for execution, the stack pointers for the next process to 
execute are loaded using the Swap Privileged Context (SWPCTX) instruction; see 
Section 2.6 and Chapter 4. 
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(§.3.21 Supervisor Stack Pointer (SSP) 


Access: | 


Read/Write 


Operation: 


IF {internal registers for stack pointers} THEN f Read 
RO « SSP | 

ELSE 
RO «+ (IPR_PCBB + HWPCB_SSP) 


IF {internal registers for stack pointers} THEN ! Write 
SSP «+ R16 

ELSE 
(IPR_PCBB + HWPCB SSP) + R16 


Value at System Initialization: 


Value in the initial HWPCB 
‘Format: 


Figure 5-21: Supervisor Stack Pointer (SSP) 


63 : 0 
Description: 


This register allows the stack pointer for Supervisor mode (SSP) to be read and 
written via MFPR and MTPR instructions that specify SSP. 


The current stack pointer may be read and written directly by specifying scalar 
register SP (R30). 


As processes are scheduled for execution, the stack pointers for the next process to 
execute are loaded using the Swap Privileged Context (SWPCTX) instruction; see 
Section 2.6 and Chapter 4. | 
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5.3.22 User Stack Pointer (USP) 


Access: 


Read/Write 


Operation: 


IF {internal registers for stack pointers} THEN ! Read 
RO + USP | 

ELSE 
RO << (IPR_PCBB + HWPCB USP) 


IF {internal registers for stack pointers} THEN ! Write 
USP -— R16 

ELSE : 

(IPR_PCBB + HWPCB_ USP) <— R16 


Value at System Initialization: 


Value in the initial HWPCB 
Format: 


Figure 5-22: User Stack Pointer (USP) 


63 , 0 


Stack Address 


Description: 


This register allows the stack pointer for User mode (USP) to be read and written 
via MFPR and MTPR instructions that specify USP. 


The current stack pointer may be read and written directly by specifying scalar 
register SP (R30). 


As processes are scheduled for execution, the two stack pointers for the next process 
to execute are loaded using the Swap Privileged Context (SWPCTX) instruction; see 
Section 2.6 and Chapter 4. 
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5.3.23 Virtual Page Table Base (VPTB) 


Access: 
Read/Write 
Operation: 
RO «- VPTB ! Read 


VPTB <« R16 ! Write 


Value at System Initialization: 


Initialized by the console in the bootstrap address space. 
Format: 


Figure 5-23: Virtual Page Table Base Register (VPTB) 


63 | 0 
7 = , 
Description: 


The Virtual Page Table Base Register contains the virtual address of the base of 
the entire three-level Page table structure. It may be read by executing an MFPR 
instruction specifying VPTB. It is written at system initialization using an MTPR 

‘instruction specifying VPTB. See Section 3.7.2 \ and Platform Section, ieee 4\ 
for initialization considerations. 
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5.3.24 Who-Am-I (WHAMI) | 


Access: 


Read 


Operation: 


RO +- WHAMI 


Value at System Initialization: 


Processor number 
Format: 


Figure 5-24: Who-Am-! Register (WHAMI) 


63 | | 0 
RO | | 
Description: 


The Who-Am-I Register provides the capability to read the current processor number 
by executing an MFPR instruction specifying WHAMI. The processor number 
returned is in the range 0 to the number of processors minus one that can be 
configured in the system. Processor number FFFF FFFF FFFF FFFF i, is reserved. 


The current processor number is useful in a multiprocessing system to index 
arrays that store per processor information. Such information is operating system 
dependent. 
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5.4 \REVISION HISTORY 
Revision 5.0, May 12, 1992 


COND HD TOT PR wo ND 4 


Added changes to MCES for ECO #45 

Added Perfmon ipr description and entry in summary table 

Added DATFX related ecos #30, #40 

Added bit field to FEN reference to PCBB as a result of datfx ecos 
Added VPTB register | 
Rewrite of MCES description 

Converted to SDML | 

Added ECO #16, #17, #20, #23, #24 

Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


IH nT Ff oO ND 


90 


10. 
11. 
12. 
13. 
14. 


15. 
16. 
17. 


MTPR IPL returns old IPL in RO 

Typos a 

Change MCES IGN/RAZ field to IMP_ 

Describe how to clear and set mode enable bits with MTPR 

Change text for ASTSR description to indicate future action for mode set 
Change ASTEN and ASTSR to access type Read/Write 


Modify (subtly) note under IPIR to avoid confusion about timing relation between 
processors | | 


Clarify what value to load into IPIR to select a particular target 


Change Value at System Initialization’ from UNDEFINED’ to UNPREDICTABLE’ 
for PRBR and SCBB 


Note effect of writing TBIS with an address that does not match any TB entry 
Note that ASN is an implicit operand to a MFPR TBCHK instruction 
Emphasise distinction between SIRR and SISR 

Reworked IPR table to show which IPRs are context-switched and which are not. 


Remove references to ASTs as ‘interrupts’, substituting ‘exception’ where 
appropriate 


Insert spaces into long hex and binary values to improve legibility 
Clarify obscure use of MTPR to both read and write certain IPRs 
Illustrate R16 bits used to ’gate’ ASTEN and ASTSR contents into RO 
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18. 


19. 
20. 
21. 
22. 
23. 


Add pointer in the IPIR IPR section pointing to Interprocessor Interrupt material 
in Chapter 6 | 


Add Kernel Stack Pointer as an internal processor register _ 

Modify definition of Absolute Time register and BB_WATCH entity. 
Changed IPR Summary Table and added R/W* description 

Specified all systems that support VAX or ULTRIX must have a BB_WATCH 
Clarified value written to IPIR to select a processor 


Revision 3.0, March 2, 1990 


1. 


a eS 


Remove ASTRR and make ASTEN/ASTSR read/write 
Add TBIAP | 

Remove ASN from TBIx and TBCHK 

Remove R17 as input to MxPR’s 

Reserve processor number FFFF FFFF FFFF FFFF 1, 


Revision 2.0, October 4, 1989 


1. 


2 
3 
4. 
5 


Remove ICIE, IPIE, ISP, KSP, SID, SSN, and TOY 


. Add AT and FEN 


Change range of WHAMI 
Remove stack alignment comments 


Change registers used to match calling standard 


Revision 1.0, March 15, 1989 


1. 


First review distribution. 
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Chapter 6 
OpenVMS Exceptions, Interrupts, and Machine 
Checks (Il) 


6.1 Introduction 


At certain times during the operation of a system, events within the system require 
the execution of software outside the explicit flow of control. When such an 
exceptional event occurs, an Alpha processor forces a change in control flow from 

_ that indicated by the current instruction stream. The notification process for such 
events is of one of three types: 


e Exceptions 


These events are relevant primarily to the currently executing process and 
normally invoke software in the context of the current process. The three types 
of exceptions are faults, arithmetic traps, and synchronous traps. Exceptions are 
described in Section 6.3. 


e Interrupts 


These events are primarily relevant to other processes, or to the system as a 
whole, and are typically serviced in a system-wide context. 


Some interrupts are of such urgency that they require high-priority service, while 
others must be synchronized with independent events. To meet these needs, each 
processor has priority logic that grants interrupt service to the highest priority 
event at any point in time. Interrupts are described in Section 6.4. 


e Machine Checks 


These events are generally the result of serious hardware failure. The registers 
and memory are potentially in an indeterminate state such that the instruction 
execution cannot necessarily be correctly restarted, completed, simulated, or 
undone. Machine checks are described in Section 6.5. 


For all such events, the change in flow of control involves changing the Program © 
Counter (PC), possibly changing the execution mode (current mode) and/or interrupt 
priority level (IPL) in the Processor Status (PS), and saving the old values of the 
PC and PS. The old values are saved on the target stack as part of an Exception, 
Interrupt, or Machine Check Stack Frame. Collectively, those elements are described 
in Section 6.2. . 


The service routines that handle exceptions, interrupts, and machine checks are 
specified by entry points in the System Control Block (SCB), described in Section 6.6. 
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Return from an exception, interrupt, or machine check, is done via the CALL_PAL 
REI instruction. As part of its work, CALL_PAL REI restores the saved values of 
PC and PS and pops them off the stack. 


6.1.1 Contrast Between Exceptions, Interrupts, and Machine Checks 


Generally, exceptions, interrupts, and machine checks are similar. However, there 
are four important differences: | 


1. 


An exception condition is caused by the execution of an instruction. An interrupt 
is caused by some activity in the system that may be independent of any 
instruction. A machine check is associated with a hardware error condition. 


The IPL of the processor is not changed when the processor initiates an exception. 
The IPL is always raised when an interrupt is initiated. The IPL is always 
raised when a machine check is initiated, and for all machine checks other than 
system correctable, is raised to 31 (highest priority level). (For system correctable 
machine checks, the IPL is raised to 20.) 


Exceptions are always initiated immediately, no matter what the processor IPL 
is. Interrupts are deferred until the processor IPL drops below the IPL of the 
requesting source. Machine checks can be initiated immediately or deferred, 


depending on error conditions. 


Some exceptions can be selectively disabled by selecting instructions that do 
not check for exception conditions. If an exception condition occurs in such an 
instruction, the condition is totally ignored and no state is saved to signal that 
condition at a later time. 


If an interrupt request occurs while the processor IPL is equal to or greater than 
that of the interrupting source, the condition will eventually initiate an interrupt 
if the interrupt request is still present and the processor IPL is lowered below 
that of the interrupting source. 


Machine checks cannot be disabled. Machine checks can be initiated immediately 
or deferred, depending on the error condition. Also, they can be deliberately 
generated by software. 


6.1.2 Exceptions, Interrupts, and Machine Checks Summary 


The table below summarizes the actions taken on an exception, interrupt, or machine 
check. The remaining sections in this chapter describe these in greater detail. 


The “SavedPC” column describes what is saved in the “PC” field of the exception 
or interrupt or machine check stack frame. Here, 


1. “Current” indicates the PC of the instruction at which the exception or 
interrupt or machine check was taken, while 


2. “Next” indicates the PC of the successor instruction. 


The “NewMode” column specifies the mode and stack that the exception or 
interrupt or machine check routine will start with. For change mode traps, 
“MostPrv” indicates the more privileged of the current and new modes. 
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The “R2” column specifies the value with which R2 is loaded, after its original 
value has been saved in the exception or interrupt or machine check stack frame. 
The SCB vector quadword, “SCBv”, is loaded into R2 for all interrupts and 
exceptions and machine checks. 


The “R3” column specifies the value with which R3 is loaded, after its original 
value has been saved in the exception or interrupt or machine check stack frame. 
The SCB parameter quadword, “SCBp”, is loaded into R3 for all interrupts and 
exceptions and machine checks. 


The “R4” column specifies the value with which R4 is loaded, after its original 
value has been saved in the exception or interrupt or machine check stack frame. 
If the “R4” column is blank the value in R4 is UNPREDICTABLE on entry to an 
interrupt or exception. Here, 


1. “VA” indicates the exact virtual address which triggered a memory 
management fault or data alignment trap. 


2. “Mask” indicates the Register Write Mask. 


3. “LAOff’ indicates the offset from the base of the logout area in the HWRPB; 
see Section 6.5.2. 


The “R5” column specifies the value with which R5 is loaded, after its original 
value has been saved in the exception or interrupt or machine check stack frame. 
If the “R5” column is blank the value in R5 is UNPREDICTABLE on entry to an 
interrupt or exception or machine check. Here, 


1. “MMF” indicates the Memory Management Flags. 
2. “Exe” indicates the Exception Summary parameter. | 
3. “RW” indicates Read/Load =0 Write/Store =1 for data align traps 


Table 6-1: Exceptions, Interrupts, and Machine Checks Summary 


SavedPC NewMode R2 R3 R4 R5 


Exceptions - Faults 


Floating Disabled Fault Current Kernel SCBv SCBp 

Memory Management Faults 

Access Control Violation Current Kernel SCBv SCBp VA MMF 
Translation Not Valid Current Kernel SCBv SCBp VA MMF 
Fault on Read Current Kernel . SCBv SCBp VA MMF 
Fault on Write Current Kernel SCBv SCBp VA MMF 
Fault on Execute Current Kernel SCBv SCBp VA MMF 
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Table 6—1 (Cont.): Exceptions, Interrupts, and Machine Checks Summary 


SavedPC NewMode R2 R3 R4 


Exceptions - Arithmetic Traps 


Arithmetic Traps 


Next 


Exceptions - Synchronous Traps 


Breakpoint Trap 
Bugcheck Trap 


Change Mode to K/E/S/U 


Illegal Instruction 
Illegal Operand 
Data Alignment Trap 


Interrupts 


Asynch System Trap (4) 
Interval Clock 
Interprocessor Interrupt 
Software Interrupts 


Performance 
monitor 


Passive Release 
Powerfail 
1/O Device 


Machine Checks 


Processor Correctable 
System Correctable 
System 


Processor 
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6.2 Processor State and Exception/Interrupt/Machine Check Stack 
Frame 


Processor state consists of a quadword of privileged information called the Processor 
Status (PS) and a quadword containing the Program Counter (PC), which is the 
virtual address of the next instruction. 


When an exception, interrupt, or machine check is initiated, the current processor 
state during the exception, interrupt, or machine check must be preserved. This is 
accomplished by automatically pushing the PS and the PC on the target stack. 


Subsequently, instruction execution can be continued at the point of the exception, 
interrupt, or machine check by executing a CALL_PAL REI instruction; see 
Chapter 2. 


Process context such as memory mapping information is not saved or restored on 
each exception, interrupt, or machine check. Instead, it is saved and restored when 
process context switching is performed. Other processor status is changed even less 
frequently; see Chapter 4. 


6.2.1 Processor Status 


The PS can be explicitly read with the CALL_PAL RD_PS instruction. The PS<SW> 
field can be explicitly written with the CALL_PAL WR_PS_SW instruction. See 
Section 2.1. 


The terms current PS and saved PS are used to distinguish between this status 
information when it is stored internal to the processor and when copies of it are 
materialized in memory. 


Figure 6-1: Current Processor Status (PS Register) 
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Figure 6-2: Saved Processor Status (PS on Stack) 
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Table 6-2: Processor Status Register Summary 


Bits 


Description 





1:0 


4:3 


12:8 
55:13 
61:56 


63:62 


Reserved for Software (SW). These bits are reserved for software use and can be 
read and written at any time by the software, regardless of the current mode. The 
value of these bits is ignored by the hardware. The software field is set to zero at 
the initiation of either an exception or an interrupt. 


Interrupt pending (IP). Set when an interrupt (software or hardware but NOT AST) 
is initiated; indicates an interrupt is in progress. 


Current mode (CM). The access mode of the currently executing process as follows: 


O- Kernel 

l- Executive 

2- Supervisor 

3- User 

Reserved to Digital, MBZ. 


Virtual machine monitor (VMM) - When set, the processor is executing in a virtual 
machine monitor. When clear, the processor is running in either real or virtual 
machine mode. 


PROGRAMMING NOTE 
This bit is only meaningful when 
running with PALcode that implements 
virtual machine capabilities. 


Interrupt priority level (IPL) - The current processor priority, in the range 0 to 31. 
Reserved to Digital, MBZ. 


Stack alignment (SP_ALIGN) - The previous stack byte alignment within a 64 byte 
aligned area, in the range 0 to 63. This field is set in the saved PS during the act 
of taking an exception or interrupt; it is used by the CALL_PAL REI instruction to 
restore the previous stack byte alignment. 


Reserved to Digitial, MBZ. 


At bootstrap, the initial value of PS is set to 1F00,,. Previous stack alignment is 
zero, IPL is 31, VMM is clear, CM is Kernel, and the SW and IP fields are zero. 


6.2.2 Program Counter 


The PC is a 64-bit virtual address. All instructions are aligned on longword 
boundaries and, therefore, hardware can assume zero for the two low-order PC bits. 


The PC can be explicitly read with the Unconditional Branch (BR) instruction. All 
branching instructions also load a new value into the PC. 
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Figure 6-3: Program Counter (PC) 


63 210 


Instruction Virtual Address <63:2> A 


6.2.3 Processor Interrupt Priority Level (IPL) 


Each processor has 32 interrupt priority levels (IPLs) divided into 16 software levels 
(numbered 0 to 15), and 16 hardware levels (numbered 16 to 31). User applications 
and most operating system software run at IPL 0, which may be thought of as process 
level. Higher numbered interrupt levels have higher priority; i.e., any request at an 
interrupt level higher than the processor’s current IPL will interrupt immediately, 
but requests at lower or equal levels are deferred. 


Interrupt levels 0 to 15 exist solely for use by software. No hardware event can 
request an interrupt on these levels. Conversely, interrupt levels 16 to 31 exist 
solely for use by hardware. Serious system failures, such as a machine check abort, 
however, raise the IPL to the highest level (31), to minimize processor interruption 
until the problem is corrected, and execute in Kernel mode on the Kernel stack. 


6.2.4 Protection Modes 


Each processor has four protection modes. The modes are Kernel, Executive, 
Supervisor, and User. Per-page memory protection varies as a function of mode (for 
example, a page can be made read-only in User mode, but read-write in Supervisor, 
Executive, or Kernel mode). 


For each process, there is a separate stack associated with each mode. Corruption 
of one stack does not affect use of the other stacks. 


Some instructions, termed privileged instructions, may only be executed in Kernel 
mode. 


6.2.5 Processor Stacks 


Each processor has four stacks. There are four process-specific stacks associated 
with the four modes of the current process. At any given time, only one of these 
stacks is actively used as the current stack. 


6.2.6 Stack Frames 


When an exception, interrupt, or machine check occurs, a stack frame is pushed 
on the target stack. Regardless of the type of event notification, this stack frame 
consists of a 64 byte-aligned structure containing the saved contents of registers 
R2..R7, the Program Counter (PC), and the Processor Status (PS). Registers R2 and 
R3 are then loaded with vector and parameter from the SCB for the exception, 
interrupt, or machine check. Registers R4 and R5 may be loaded with data 
pertaining to the exception, interrupt, or machine check. The specific data loaded is 
described below in conjunction with each exception, interrupt, or machine check; if 
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no specific data is specified, the contents of R4 and R5 are UNPREDICTABLE. After 
the stack is built, the contents of registers R6 and R7 are UNPREDICTABLE. | 


The Program Counter value saved is that of the instruction encountering the 
exception in the case of faults, that of the next instruction in the case of traps 
and interrupts, and, on a best-effort basis, and that of the next instruction in the 
case of machine checks. Return from an exception, interrupt, or machine check is 
done via the CALL_PAL REI instruction, which restores the saved values of PC, PS, 
and R2..R7, thus re-executing the instruction in the case of faults, and proceeding 
to the next instruction in the case of traps, interrupts, and machine checks. 


Figure 64: Stack Frame 
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6.3 Exceptions 


Exception service routines execute in response to exception conditions caused by 
software. Most exception service routines execute in Kernel mode, on the Kernel 
stack; all exception service routines execute at the current processor IPL. Change 
Mode exception routines for CHMU/CHMS/CHME execute in the more privileged 
of the current mode or the target mode (U/S/E), on the matching stack. Exception 
service routines are usually coded to avoid exceptions; however, nested exceptions 
can occur. 


There are three types of exceptions: 


e A fault is an exception condition that occurs during an instruction and leaves 
the registers and memory in a consistent state such that elimination of the fault 
condition and subsequent re-execution of the instruction will give correct results. 
Faults are not guaranteed to leave the machine in exactly the same state it was 
in immediately prior to the fault, but rather in a state such that the instruction 
can be correctly executed if the fault condition is removed. The PC saved in the 
exception stack frame is the address of the faulting instruction. A CALL_PAL 
REI instruction to this PC will reexecute the faulting instruction. 


6-8 OpenVMS Alpha Software (Il) — 





e An arithmetic trap is an exception condition that occurs at the completion of 
the operation that caused the exception. Since several instructions may be 
in various stages of execution at any point in time, it is possible for multiple 
arithmetic traps to occur simultaneously. The PC that is saved in the exception 
frame on traps is that of the next instruction that would have been issued if the 
trapping condition(s) had not occurred. This is not necessarily the address of the 
instruction immediately following the one(s) encountering the trap condition, and 
intervening instructions may have changed operands or other state used by the 
instruction(s) encountering the trap condition(s). A CALL_PAL REI instruction 
to this PC will not reexecute the trapping instruction(s), nor will it reexecute 
any intervening instructions; it will simply continue execution from the point at 
which the trap was taken. 


In general, it is difficult to fixup results and continue program execution at the 
point of an arithmetic trap. Software can force a trap to be continued more easily 
without the need for complicated fixup code. This is accomplished by following 
a set of code-generation restrictions in code that could cause arithmetic traps 
which are to be completed by a software trap handler (see Common Architecture, 
Chapter 4), including specifying the /S software completion modifier in each such 
instruction. 


The AND of all the software completion modifiers for trapping instructions is 

_ provided to the arithmetic trap handler in the exception summary SWC bit. If 
SWC is set, a trap handler may find the trigger instruction by scanning backward 
from the trap PC until each register in the register write mask has been an 
instruction destination. The trigger instruction is the first instruction in I-stream 
order to get a trap within a trap shadow (see Common Architecture, Chapter 4 
for definition of trap shadow). If the SWC bit is clear, no fixup is possible (the 
trigger instruction may have been followed by a taken branch, so the trap PC 
cannot be used to find it). 


e <A synchronous trap is an exception condition that occurs at the completion of 
the operation that caused the exception (or, if the operation can only be partially 
carried out, at the completion of that part of the operation), and no subsequent 
instruction is issued before the trap occurs. 


Synchronous traps are divided into data alignment traps and all other 
synchronous traps. 


6.3.1 Faults 


The six types of faults signal that an instruction or its operands are in some way 
illegal. These faults are all initiated in Kernel mode and push an exception stack 
frame onto the stack. Upon entry to the exception routine, the saved PC (in the 
exception stack frame) is the virtual address of the faulting instruction. 


The six faults include the Floating Disable Fault described in the next subsection 
and five memory management faults. 
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6.3.1.1 


6.3.1.2 


6.3.1.3 


6.3.1.4 


Memory management faults occur when a virtual address translation encounters an | 
exception condition. This can occur as the result of instruction fetch or during a load 
or store operation. 


Immediately following a memory management fault, register R4 contains the exact 
virtual address encountering the fault condition. 


The register R5 contains the “MM Flag” quadword. 
“MM Flag” is set as follows: 

0000 0000 0000 0000,;, for a faulting data read 

0000 0000 0000 0001 i¢ for a faulting I-fetch operation 
8000 0000 0000 0000,, _ =for a faulting write operation 


The faulting instruction is the instruction whose fetch faulted, or the load, store, or 
PALcode instruction that encountered the fault condition. 


Chapter 3 describes the memory management architecture of Alpha in more detail. 


Floating Disabled Fault 


A Floating Disabled Fault is an exception that occurs when an attempt is made to 
execute a floating-point instruction and the floating enable (FEN) bit in the HWPCB 
is not set. 


Access Control Violation (ACV) Fault 


An ACV fault is a memory management fault indicating that an attempted access 
to a virtual address was not allowed in the current mode. 


ACV faults usually indicate program errors, but in some cases, such as automatic 
stack expansion, can mean implicit operating system functions. 


ACV faults take precedence over Translation Not Valid, Fault on Read, Fault on 
Write, and Fault on Execute faults. 


ACV faults take precedence over Translation Not Valid faults so that a malicious 
user could not degrade system performance by causing spurious page faults to pages 
for which no access is allowed. 


Translation Not Valid (TNV) 


A TNV fault is a memory management fault that indicates that an attempted access 
was made to a virtual address whose Page Table Entry (PTE) was not valid. 


Software may use TNV faults to implement virtual memory capabilities. 


Fault On Read (FOR) 


An FOR fault is a memory management fault that indicates that an attempted data 
read access was made to a virtual address whose Page Table Entry (PTE) had the 
Fault on Read bit set. 





6.3.1.5 


6.3.1.6 


IMPLEMENTATION NOTE 
This allows an implementation only to invalidate entries 
from the Data-stream Translation Buffer on Fault On 
Read faults. 


Note that the Translation Buffer may reload and cache the old PTE value between 
the time when the FOR fault invalidates the old value from the Translation Buffer 
and the time when software updates the PTE in memory. Software that depends on 
the processor-provided invalidate must thus be prepared to take another FOR fault 
on a page after clearing the page’s PTE<FOR> bit. The second fault will invalidate 
the stale PTE from the Translation Buffer, and the processor cannot load another 
stale copy. Thus in the worst case, a multiprocessor system will take an initial FOR 
fault and then an additional FOR fault on each processor. In practice, even a single 
repetition is unlikely. 


Software may use FOR faults to implement watchpoints, to collect page usage 
statistics, and to implement execute-only pages. 


Fault On Write (FOW) 


A FOW fault is a memory management fault that indicates that an attempted data 
write access was made to a virtual address whose Page Table Entry (PTE) had the 
Fault On Write bit set. 


As a part of initiating the FOW fault, the processor invalidates the Translation 
Buffer entry that caused the fault to be generated. 


IMPLEMENTATION NOTE 
This allows an implementation only to invalidate entries 
from the Data-stream Translation Buffer on Fault On 
Write faults. 


Note that the Translation Buffer may reload and cache the old PTE value between 
the time when the FOW fault invalidates the old value from the Translation Buffer 
and the time when software updates the PTE in memory. Software that depends on 
the processor-provided invalidate must thus be prepared to take another FOW fault 
on a page after clearing the page’s PTE<FOWS> bit. The second fault will invalidate 
the stale PTE from the Translation Buffer, and the processor cannot load another 
stale copy. Thus in the worst case, a multiprocessor system will take an initial FOW 
fault and then an additional FOW fault on each processor. In practice, even a single 
repetition is unlikely. 


Software may use FOW faults to maintain modified page information, to implement 
copy on write and watchpoint capabilities, and to collect page usage statistics. 


Fault On Execute (FOE) 


An FOE fault is a memory management fault indicating that an attempted 
instruction stream access was made to a virtual address whose Page Table Entry 
(PTE) had the Fault On Execute bit set. 
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As a part of initiating the FOE fault, the processor invalidates the Translation Buffer 
entry that caused the fault to be generated. 


IMPLEMENTATION NOTE 
This allows an implementation only to invalidate entries 
from the Instruction-stream Translation Buffer on Fault 
On Execute faults. | 


Note that the Translation Buffer may reload and cache the old PTE value between 
the time when the FOE fault invalidates the old value from the Translation Buffer 
and the time when software updates the PTE in memory. Software that depends on 
the processor-provided invalidate must thus be prepared to take another FOE fault 
on a page after clearing the page’s PTE<FOE> bit. The second fault will invalidate 
the stale PTE from the Translation Buffer, and the processor cannot load another 
stale copy. Thus in the worst case, a multiprocessor system will take an initial FOE 
fault and then an additional FOE fault on each processor. In practice, even a mer 
repetition is unlikely. 


Software may use FOE faults to implement access mode changes and protected entry 
to Kernel mode, to collect page usage statistics, and to detect programming errors 
that try to execute data. 


6.3.2 Arithmetic Traps 


An arithmetic trap is an exception that occurs as the result of performing an 
arithmetic or conversion operation. 


If integer register R31 or floating register F31 is specified as the destination of an . 
operation that can cause an arithmetic trap, it is UNPREDICTABLE whether the 
trap will actually occur, even if the operation would definitely produce an exceptional 
result. 


Arithmetic traps are initiated in Kernel mode and push the exception stack frame 
on the Kernel stack. The Register Write Mask is saved in R4, and the Exception 
Summary parameter is saved in R5. These are described below. 


When an arithmetic exception condition is detected, several instructions may be 
in various stages of execution. These instructions are allowed to complete before 
the arithmetic trap can be initiated. Some of these instructions may themselves 
cause further arithmetic traps. Thus it is possible for several arithmetic iene to be 
reported simultaneously. 


It is also possible for the result of an instruction that causes an arithmetic trap to 
be used as an operand in a subsequent instruction before the trap is taken. If this | 
would produce undesired behavior, software is responsible for inserting appropriate 
TRAPB instructions to cause the trap to be recognized before the result is used. 


Integer exceptional results (integer overflow) can be forwarded to the address 
calculation for load and store instructions, to the address calculation for jump 
instructions, as the source data for a store instruction, or as the source data for a 
conditional branch instruction. This can result in the generation of an inappropriate 
address, the storing of exceptional results in memory, or an unintended branch. 
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6.3.2.1 


If this would produce undesired behavior, software is responsible for inserting 
appropriate TRAPB instructions to cause the trap to be recognized before the result 
is used. 


Exception Summary Parameter 


The Exception Summary parameter records the various types of arithmetic traps 
that can occur together. These types of traps are described in subsections below. 


Figure 6-5: Exception Summary 
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Table 6-3: Exception Summary 
Bit Description 


o> 
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0 Software Completion (SWC) 


Is set when all of the other arithmetic exception bits were set by floating-operate. 
instructions with the /S software completion trap modifier set. See Common 
Architecture, Chapter 4 for rules about setting the /S modifier in code that may cause 
an arithmetic trap, and Section 6.3 for rules about using the SWC bit in a trap handler. 


1 Invalid Operation (INV) 


An attempt was made to perform a floating arithmetic, conversion, or comparison 
operation, and one or more of the operand values were illegal. 


2 Division by Zero (DZE) 

An attempt was made to perform a floating divide operation with a divisor of zero. 
3 Overflow (OVF) 

A floating arithmetic or conversion operation overflowed the destination exponent. 
4 Underfiow (UNF) | 

A floating arithmetic or conversion operation underflowed the destination exponent. 
5 Inexact Result (INE) 


_A floating arithmetic or conversion operation gave a result that differed from the 
mathematically exact result. | 


6 Integer Overflow (IOV) 


An integer arithmetic operation or a conversion from floating to integer overflowed the 
destination precision. 
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6.3.2.2 


Register Write Mask 


The Register Write Mask parameter records all registers that were targets of 
instructions that set the bits in the exception summary register. There is a one- 
to-one correspondence between bits in the Register Write Mask quadword and the 


register numbers. The quadword records, starting at bit 0 and proceeding right 


6.3.2.3 


6.3.2.4 


6.3.2.5 


to left, which of the registers RO through R31, then FO through F31, received an 
exceptional result. 


NOTE 
For a sequence such as: 
ADDF F1,F2,F3 
MULF F4,F5,F3 


if the add overflows and the multiply does not, the OVF 
bit is set in the exception summary, and the F3 bit is 
set in the register mask, even though the overflowed 
sum in F3 can be overwritten with an in-range product 
by the time the trap is taken. (This code violates the 
destination reuse rule for software completion. See 
Common Architecture, Chapter 4 for the destination 
reuse rules.) | 


The PC value saved in the exception stack frame is the virtual address of the next 
instruction. This is defined as the virtual address of the first instruction not executed 
after the trap condition was recognized. 


Invalid Operation (INV) Trap 


An INV trap is reported for most floating-point operate instructions with an input 
operand that is a VAX reserved operand, VAX dirty zero, IEEE NaN, IEEE infinity, 
or IEEE denormal. 


Floating INV traps are always enabled. If this trap occurs, the result register is 
written with an UNPREDICTABLE value. 


Division by Zero (DZE) Trap 


A DZE trap is reported when a finite number is divided by zero. Floating DZE 
traps are always enabled. If this trap occurs, the result register is written with an 
UNPREDICTABLE value. 


Overflow (OVF) Trap 


An OVF trap is reported when the destination’s largest finite number is exceeded in 
magnitude by the rounded true result. Floating OVF traps are always enabled. If 
this trap occurs, the result register is written with an UNPREDICTABLE value. 
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6.3.2.6 Underflow (UNF) Trap 


A UNF trap is reported when the destination’s smallest finite number exceeds in 
magnitude the non-zero rounded true result. Floating UNF trap enable can be 
specified in each floating-point operate instruction. If underflow occurs, the result 
register is written with a true zero. 


6.3.2.7 Inexact Result (INE) Trap 


An INE trap is reported if the rounded result of an IEEE operation is not exact. 
INE trap enable can be specified in each IEEE floating-point operate instruction. 
The unchanged result value is stored in all cases. 


6.3.2.8 Integer Overflow (IOV) Trap 


An IOV trap is reported for any integer operation whose true result exceeds the 
destination register size. IOV trap enable can be specified in each arithmetic integer 
operate instruction and each floating-point convert-to-integer instruction. If integer 
overflow occurs, the result register is written with the truncated true result. 


6.3.3 Synchronous Traps 


A synchronous trap is an exception condition that occurs at the completion of the 
operation that caused the exception (or, if the operation can only be partially carried 
out, at the completion of that part of the operation), but no successor instruction is 
allowed to start. All traps that are not arithmetic traps are synchronous traps. 


Some synchronous traps are caused by PALcode instructions: BPT, BUGCHK, 
CHMU, CHMS, CHME, and CHMK. For synchronous traps, the PC saved in the 
exception stack frame is the address of the instruction immediately following the one 
causing the trap condition. A CALL_PAL REI instruction to this PC will continue 
without reexecuting the trapping instruction. The following subsections describe the 
synchronous traps in detail. 


6.3.3.1 Data Alignment Trap 


All data must be naturally aligned or an alignment trap may be generated. Natural 
alignment means that data bytes are on byte boundaries, data words are on word 
boundaries, data longwords are on longword boundaries, and data quadwords are 
on quadword boundaries. 


A Data Alignment trap is generated by the hardware when an attempt is made to 
load or store a longword or quadword to/from a register using an address that does 
not have the natural alignment of the particular data reference. 


Data alignment traps are fixed up by the PALcode and are optionally reported to the 
operating system under the control of the DAT bit. If the bit is zero, the trap will 
be reported. If the bit is set, after the alignment is corrected, control is returned to 
the user. In either case, if the PALcode detects a LDx_L or STx_C instruction, no 
correction is possible and an illegal operand exception is generated. 


The system software is notified via the generation of a Kernel mode exception 
through the Unaligned_Access SCB vector (280;,) The virtual address of the 
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unaligned data being accessed 1s area in R4. R65 indicates whether the operation 
was a read or a write ( 0 = read/load 1 = write/store). 


PALcode may write partial results to memory without probing to make sure all | 
writes will succeed when dealing with unaligned store operations. 


If a memory management exception condition occurs while reading or writing part 
of the unaligned data, the appropriate memory management fault is generated. 


Software should avoid data misalignment whenever possible since the emulation 
performance penalty may be as large as 100 to 1. 


The Data Alignment trap control bit is included in the HWPCB at offset +56 bit 63. 
In order to change this bit for the currently executing process, the DATFX IPR may | 
be written via a CALL_PAL MTPR_DATFX instruction. This operation will also 
update the value in the HWPCB. 


6.3.3.2 Other Synchronous Traps 


With the traps described in this subsection, the SCB vector quadword is saved in 
R2 and the SCB parameter quadword is saved in R3. The change mode traps are 
initiated in the more privileged of the current mode and the target mode, while the 
other traps are initiated in Kernel mode. 


6.3.3.2.1 Breakpoint Trap 


_A Breakpoint trap is an exception that occurs when a CALL_PAL BPT instruction 
is executed; see Chapter 2. Breakpoint traps are intended for use by —— and 
can be used to place breakpoints in a program. 


Breakpoint traps are initiated in Kernel mode so that system debuggers can capture 
breakpoint traps that occur while the user is executing system code. 


6.3.3.2.2 Bugcheck Trap 


A Bugcheck trap is an exception that occurs when a CALL_PAL BUGCHK 
instruction is executed; see Chapter 2. Bugchecks are used to log errors detected by 
software. 


6.3.3.2.3 Illegal Instruction Trap 


An Illegal instruction Trap is an exception that occurs when an attempt is made 
to execute an instruction whose opcode is reserved to Digital, is a subsetted opcode 
that requires emulation on the host implementation, or is a privileged instruction 
and the current mode is not Kernel. 


6.3.3.2.4 lilegal Operand Trap 


An Illegal Operand Trap occurs when an attempt is made to execute PALcode with 
operand values that are illegal or reserved for future use by Digital. 


Illegal operands include: 


e An invalid combination of bits in the PS restored by the CALL_PAL REI 
instruction. | 
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e An unaligned operand passed to PALcode. 


6.3.3.2.5 Generate Software Trap 


A Generate Software Trap is an exception that occurs when a CALL_PAL GENTRAP 
instruction is executed; see Chapter 2. The intended use is for low-level compiler- 
generated code that detects conditions such as divide-by-zero, range errors, subscript 
bounds and negative string lengths. 


6.3.3.2.6 Change Mode to Kernel Trap 


A Change Mode to Kernel trap is an exception that occurs when a CALL_PAL CHMK 
instruction is executed; see Chapter 2. Change Mode to Kernel traps are initiated 
in Kernel mode and push the exception frame on the Kernel stack. 


6.3.3.2.7 Change Mode to Executive Trap 


A Change Mode to Executive trap is an exception that occurs when a CALL_PAL 
CHME instruction is executed; see Chapter 2. Change Mode to Executive traps are 
initiated in the more privileged of the current mode and Executive mode, and push 
the exception frame on the target stack. 


6.3.3.2.8 Change Mode to Supervisor Trap 


A Change Mode to Supervisor trap is an exception that occurs when a CALL_PAL 
CHMS instruction is executed; see Chapter 2. Change Mode to Supervisor traps are 
initiated in the more privileged of the current mode and Supervisor mode, and push 
the exception frame on the target stack. 


6.3.3.2.9 Change Mode to User Trap 


A Change Mode to User trap is an exception that occurs when a CALL_PAL CHMU 
instruction is executed; see Chapter 2. Change Mode to User traps are initiated 
in the more privileged of the current mode and User mode, and push the exception 
frame on the target stack. 


6.4 Interrupts 


The processor arbitrates interrupt requests according to priority. When the priority 
of an interrupt request is higher than the current processor IPL, the processor will 
raise the IPL and service the interrupt request. The interrupt service routine is 
entered at the IPL of the interrupting source, in Kernel mode, and on the Kernel 

stack. Interrupt requests can come from I/O devices, memory controllers, other 
processors, or the processor itself. 


The priority level of one processor does not affect the priority level of other 
processors. Thus, in a multiprocessor system, interrupt levels alone cannot be used 
to synchronize access to shared resources. 


Synchronization with other processors in a multiprocessor system involves a 
combination of raising the IPL and.executing an interlocking instruction sequence. 
Raising the IPL prevents the synchronization sequence itself from being interrupted 
on a single processor while the interlock sequence guarantees mutual exclusion 
with other processors. Alternately, one processor can issue explicit interprocessor 
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interrupts (and wait for acknowledgment) to put other processors in a known 
software state, thus achieving mutual exclusion. 


_In some implementations, several instructions may be in various stages of execution 
simultaneously. Before the processor can service an interrupt request, all active 
instructions must be allowed to complete without exception. Thus, when an 
exception occurs in a currently active instruction, the exception is initiated and 
the exception stack frame built immediately before the interrupt is initiated and its 
stack frame built. 


The following events will cause an interrupt: 
° Software interrupts - IPL 1 to 15. 

e Asynchronous System Traps - IPL 2. 

e Passive Release interrupts - IPL 20 to 28. 
e YV/O Device interrupts - IPL 20 to 23. 

e Interval Clock interrupt - IPL 22. 

e¢ Interprocessor interrupt - IPL 22. 

¢ Performance Monitor interrupt - IPL 29_ 
¢ Powerfail interrupt - IPL 30. 


Interrupts are initiated in Kernel mode and push the interrupt stack frame of eight 
quadwords onto the Kernel stack. The PC saved in the interrupt stack frame is 
the virtual address of the first instruction not executed after the interrupt condition 
was recognized. A CALL_PAL REI instruction to the saved PC/PS will continue 
execution at the point of interrupt. 


Each interrupt source has a separate vector location (offset) within the System 
Control Block (SCB); see Section 6.6. With the exception of I/O device interrupts, 
each of the above events has a unique fixed vector. I/O device interrupts occupy a 
range of vectors that can be both statically and dynamically assigned. Upon entry to 
the interrupt service routine, R2 contains the SCB vector quadword and R3 contains 
the SCB parameter quadword. For Corrected Error interrupts, R4 optionally locates 
additional information; see Section 6.5.2. 


In order to reduce interrupt overhead, no memory mapping information is changed 
when an interrupt occurs. Therefore, the instructions, data, and the contents of the 
interrupt vector for the interrupt service routine must be present in every process 
at the same virtual address. 


Interrupt service routines should follow the discipline of not lowering IPL below 
their initial level. Lowering IPL in this way could result in an interrupt at an 
intermediate level which would cause the stack nesting to be incorrect. 


Kernel mode software may need to raise and lower IPL during certain instruction 
sequences that must synchronize with possible interrupt conditions (such as 
powerfail). This can be accomplished by specifying the desired IPL and executing 
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a CALL_PAL MTPR_IPL instruction or by executing a CALL_PAL REI instruction 
that restores a PS that contains the desired IPL; see Chapter 2. 


6.4.1 Software Interrupts - IPLs 1 to 15 
6.4.1.1 Software Interrupt Summary Register 


The architecture provides fifteen priority interrupt levels for use by software (level 
0 is also available for use by software but interrupts can never occur at this level). 
The Software Interrupt Summary Register (SISR) stores a mask of pending software 
interrupts. Bit positions in this mask which contain a 1 correspond to the ree on 
which software interrupts are pending. 


When the processor IPL drops below that of the highest requested software interrupt, 
a software interrupt is initiated and the corresponding bit in the SISR is cleared. 


The SISR is a read-only internal processor register which may be read by Kernel 
mode software by executing a CALL_PAL MFPR_SISR instruction; see Section 5.3. 


6.4.1.2 Software Interrupt Request Register 


The Software Interrupt Request Register (SIRR) is a write-only internal processor 
register used for making software interrupt requests. 


Kernel mode software may request a software interrupt at a particular level by 
executing a CALL_PAL MTPR_SIRR instruction; see Section 5.3. 


If the requested interrupt level is greater than the current IPL, the interrupt will 
occur before the execution of the next instruction. If, however, the requested level is 
equal to or less than the current processor IPL, the interrupt request will be recorded 
in the Software Interrupt Summary Register (SISR) and deferred until the processor 
IPL drops to the appropriate level. 


Note that no indication is given if there is already a request at the specified level. 
Therefore, the respective interrupt service routine must not assume that there is a 
one-to-one correspondence between interrupts requested and interrupts generated. 
A valid protocol for generating this correspondence is: 


1. The requester places information in a control block and then inserts the control 
block in a queue associated with the respective software interrupt level. 


2. The requester uses CALL_PAL MTPR_SIRR to request an interrupt at the 
appropriate level. 


3. When enabling conditions arise, processor HW clears the appropriate SISR bit 
as part of initiating the software interrupt. | 


4, The interrupt service routine attempts to remove a control block from the request 
queue. If there are no control blocks in the queue, the interrupt is dismissed with 
a CALL_PAL REI instruction. 


5. If a valid control block is removed from the queue, the requested service is 
performed and Step 3 is repeated. 
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6.4.2 Asynchronous System Trap - IPL 2 


Asynchronous System Traps (ASTs) are a means of notifying a process of events that | 
are not synchronized with its execution, but which must be dealt with in the context 


of the process. An AST is initiated in Kernel mode at IPL 2 when the current mode 
is less privileged than or equal to a mode for which an AST is pending and not 
disabled, with PS<IPL> less than 2; see Sections 6.7.6 and 4.3. 


There are four separate per-mode SCB vectors, one for each of Kernel, Executive, 


Supervisor, and User modes. 


On encountering an AST, the interrupt stack frame is pushed on the Kernel stack; 
the value of the PC saved in this stack frame is the address of the next instruction 
to have been executed if the interrupt had not occurred. The SCB vector quadword 
is saved in R2 and the SCB parameter quadword in R3. 


6.4.3 Passive Release Interrupts—IPLs 20 to 23 


Passive releases occur when the source of an interrupt granted by a processor cannot 
be determined. This can happen when the requesting I/O device determines that it 
no longer requires an interrupt after requesting one, or when a previously requested 
interrupt has already been serviced by another processor in some multiprocessor 
configurations. The interrupt handler for passive releases executes at the priority 
level of the interrupt request. 


6.4.4 I/O Device Interrupts - IPLs 20 to 23 


The architecture provides four priority levels for use by I/O devices. I/O device 
interrupts are requested when the device encounters a completion, attention, or 
error condition and the respective interrupt is enabled. \ See Platform Section, 
Chapter 3 for more information. \ 


6.4.5 Interval Clock Interrupt - IPL 22 
The Interval Clock requests an interrupt periodically. 


At least 1000 interval clock interrupts occur per second. An entry in the HWRPB 
contains the number of interval clock interrupts per second that occur in an actual 
Alpha implementation, scaled up by 4096, and rounded to a 64-bit integer. \ (See 
Platform Section, Chapter 3.) \ 


The accuracy of the interval clock must be at least 50 parts per million (ppm). 


| HARDWARE/SOFTWARE NOTE 
For example, an interval of 819.2 usec derived from a 10 
MHz Ethernet clock and a 13-bit counter is acceptable. 


To guarantee software progress, the interval clock 
interrupt should be no more frequent than the time it 
takes to do 500 main memory accesses. Over the life of 
the architecture, this interval may well decrease much 
more slowly than CPU cycle time decreases. 
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Other constraints may apply to Secure Kernel systems. 


6.4.5.1 Interprocessor Interrupt - IPL 22 


Interprocessor interrupts are provided to enable operating system software running 
on one processor to interrupt activity on another processor and cause operating 
system dependent actions to be performed. 


6.4.5.1.1 Interprocessor Interrupt Request Register 


The Interprocessor Interrupt Request Register (IPIR) is a write-only internal 
processor register used for making a request to interrupt a specific processor. 


Kernel mode software may request to interrupt a particular processor by executing 
a CALL_PAL MTPR_IPIR instruction; see Section 5.3. 


If the specified processor is the same as the current processor and the current IPL is 
less than 22, then the interrupt may be delayed and not initiated before the execution 
of the next instruction. 


Note that, like software interrupts, no indication is given as to whether there is 
already an interprocessor interrupt pending when one is requested. Therefore, 
the interprocessor interrupt service routine must not assume there is a one-to-one 
correspondence between interrupts requested and interrupts generated. A valid 
protocol similar to the one for software interrupts for generating this correspondence 
is: 

1. The requester places information in a control block and then inserts the control 

block in a queue associated with the target processor. 


2. The requester uses CALL_PAL MTPR_IPIR to request an interprocessor 
interrupt on the target processor. 


3. The interprocessor interrupt service routine on the target processor attempts to 
remove a control block from its request queue. If there are no control blocks 
remaining, the interrupt is dismissed with a CALL_PAL REI instruction. 


4. If a valid control block is removed from the queue, the specified action is 
performed and Step 3 is repeated. 
6.4.6 Performance Monitor Interrupts—IPL 29 


These interrupts provide some of the support for processor or system performance 
measurements. The implementation is processor or system specific. 


6.4.7 Powerfail Interrupt - IPL 30 


If the system power supply backup option permits powerfail recovery, a Powerfail 
interrupt is generated to each processor when power is about to fail. \ See Platform 
Section, Chapter 4 for a description of powerfail recovery requirements, and for 
a description of the interactions between system software and the console during 
system restarts. \ 


In systems in which the backup option maintains only the contents of memory and 
keeps system time with the BB_WATCH, the power supply requests a powerfail 
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interrupt to permit volatile system state to be saved. Prior to dispatching to the 

_powerfail interrupt service routine, PALcode is responsible for saving all system 
state which is not visible to system software. Such state includes, but is not limited 
to, processor internal registers and PALcode temporary variables. 


PALcode is also responsible for saving the contents of any writeback caches 
or buffers, including the powerfail interrupt stack frame. System software is 
responsible for saving all other system state. Such state includes, but is not limited 
to, processor registers and writeback cache contents. State can be saved by forcing 
all written data to a backed-up part of the memory subsystem; software may use 
the CALL_PAL CFLUSH instruction. | 


The Powerfail interrupt will not be initiated until the processor IPL drops below 
30. Thus, critical code sequences can block the power-down sequence by raising the 
IPL to 31. Software, however, must take extra care not to lock out the power-down 
sequence for an extended period of time. \The time interval is platform specific. \ 


Explicit state is not provided by the architecture for software to directly determine 
whether there were outstanding interrupts when powerfail occurred. It is the 
responsibility of software to leave sufficient information in memory so that it may 
determine the proper action on power-up. 


6.5 Machine Checks 


A Machine Check, or mcheck, indicates that a hardware error condition was detected 
and may or may not be successfully corrected by hardware or PALcode. Such 
error conditions can occur either synchronously or asynchronously with respect to 
instruction execution. There are four types: 


1. System Machine Check (IPL 31) 


These machine checks are generated by error conditions which are detected 
asynchronously to processor execution but are not successfully corrected by 
hardware or PALcode. Examples of system machine check conditions include 
protocol errors on the processor-memory-interconnect and unrecoverable memory 
errors. 


System machine checks are always maskable and deferred until processor IPL 
drops below IPL 31. 


2. Processor Machine Check (IPL 31) 


These machine checks indicate that a processor internal error was detected 
and not successfully corrected by hardware or PALcode. Examples of processor 
machine check conditions include processor internal cache errors, translation 
buffer parity errors, or read access to a non-existent local I/O space location 
(NXM). 


' Processor machine checks may be nonmaskable or maskable. If nonmaskable, 
they are initiated immediately, even if the processor IPL is 31. If maskable, they 
are deferred until processor IPL drops below IPL 31. 


3. System Correctable Machine Check (IPL 20) 
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These machine checks are generated by error conditions that are detected 
asynchronously to processor execution and are successfully corrected by 
hardware or PALcode. Examples of system correctable machine check conditions 
include single bit errors within the memory subsystem. 


System correctable machine checks are always maskable and deferred until 
processor IPL drops below IPL 20. 


4. Processor Correctable Machine Check (IPL 31) 


These machine checks indicate that a processor internal error was detected 
and successfully corrected by hardware or PALcode. Examples of processor 
correctable machine check conditions include corrected processor internal cache 
errors and corrected translation buffer tab errors. 


Processor correctable machine checks may be nonmaskable or maskable. If 
nonmaskable, they are initiated immediately, even if the processor IPL is 31. 
If maskable, they are deferred until processor IPL drops below IPL 31. 


Machine Checks are initiated in Kernel mode, on the Kernel stack, and cannot be 
disabled. 


Correctable machine checks permit the pattern and frequency of certain errors to be 
captured. The delivery of these machine checks to system software can be disabled 
by setting IPR MCES<4:3>, as described in Chapter 5. Note that setting IPR 
MCES<4:3> does not disable the generation of the machine check or the correction of 
the error, but rather suppresses the reporting of that correction to system software. 


The PC in the machine check stack frame is that of the next instruction that would 
have issued if the machine check condition had not occurred. This is not necessarily 
the address of the instruction immediately following the one encountering the error, 
and intervening instructions may have changed operands or other state used by the 
instruction encountering the error condition. A CALL_PAL REI instruction to this 
PC will simply continue execution from the point at which the machine check was 
taken. : 


NOTE 
On machine checks, a meaningful PC is delivered on a 
best-effort basis. The machine state, processor registers, 
memory, and I/O devices may be indeterminate. 


Machine checks may be deliberately generated by software, such as by probing non- 
existent-memory during memory sizing or searching for local I/O devices. In such 
a case, the DRAINA PALcode instruction can be called to force any outstanding 
machine checks to be taken before continuing. 
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The reaction of system software to machine checks is specific to the characteristics 
of the processor, platform, and system software. System software must determine if 
operation should be discontinued on an implementation-specific basis. 


To assist system software, PALcode provides a retry flag in the machine check logout 
frame (see Figure 6—6. If set, the state of the processor and platform hardware has 
not been compromised; system software operation should be able to continue. 


If the retry flag is clear, the state of the processor is either unknown or is known to 
have been updated during partial execution of one or more instructions. System 
software operation can continue only after system software determines that the 
hardware state change permits and/or takes corrective action. 


PALcode should take appropriate implementation-specific actions prior to setting 


the retry flag. PALcode should also attempt to ensure that each encountered error 
condition generates only one machine check. 


IMPLEMENTATION NOTE 
An important example of using the retry flag is read 
NXM. 


Also, a read NXM should not generate both a Processor 
Machine Check and a System Machine Check. 


PALcode sets an internal Machine-Check-In-Progress flag in the Machine Check 
Error Summary (MCES) register prior to initiating a system or processor machine 
check. System software must clear that flag to dismiss the machine check If a second 
uncorrectable machine check hardware error condition is detected while the flag is 
set, or if PALcode cannot deliver the machine check, PALcode forces the processor to 
enter console I/O mode, and subsequent actions, such as processor restart, are taken 
by the console. The REASON FOR HALT code is “double error abort encountered”. 
\ See Platform Section, Chapter 4. \ 


Similiarly, PALcode sets an internal correctable Machine-Check-In-Progress flag in 
the Machine Check Error Summary (MCES) register prior to initiating a system 
correctable error interrupt or processor correctable machine check. System software 
must clear that flag to dismiss the condition and permit the reuse of the logout area. 
If a second correctable hardware error condition is detected while the flag is set, the 
error is corrected, but not reported. PALcode does not overwrite the logout area and 
the processor remains in program I/O mode. 
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6.5.2 Logout Areas 


When a hardware error condition is encountered, PALcode optionally builds a logout 
frame prior to passing control to the machine check service routine. \ The logout 
frame is built in the Logout Area located by the processor’s per-CPU slot in the 
HWRPB; see Platform Section, Chapter 3. \ 


Figure 6-6: Corrected Error and Machine Check Logout Frame 
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:+FRAME_SIZE 
Table 6—4: Corrected Error and Machine Check Logout Frame Field 
Offset Description | 
FRAME FRAME SIZE - Size in bytes of the logout frame including the FRAME SIZE 
| longword. 
+04 FRAME FLAGS - Informational flags. 


Bit Description 


31 RETRY FLAG - Indicates whether execution can be resumed 
after dismissing this machine check. Set on Corrected Error 
interrupts; may be set on Machine Checks. 


30 SECOND ERROR FLAG - Indicates that a second correctable 
error was encountered. Set on Corrected Error interrupts 
when a correctable error was encountered while the relevant 
correctable error bit (PCE or SCE) is set in the MCKES register. 
Clear on Machine Checks. 


29-0 SBZ. 


+08 CPU OFFSET - Offset in bytes from the base of the logout frame to the 
cpu-specific information. If 16 the frame contains no PALcode-specific 
information. If CPU OFFSET is equal to SYS OFFSET, the frame contains 
no cpu-specific information. 
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Table 6-4 (Cont.): Corrected Error and Machine Check Logout Frame Fields 
Offset Description 


+12 _ SYS OFFSET - Offset in bytes from the base of the logout frame to the 
system-specific information. If SYS OFFSET is equal to FRAME SIZE, the 
frame contains no system-specific information. 


+16 PALCODE INFORMATION - PALcode-specific logout information. 
+CPU OFFSET CPU INFORMATION - Cpu-specific logout information. 
+SYS OFFSET SYS INFORMATION - System platform-specific logout information. 


The logout frame is optional; the service routine uses R4 to locate the frame, if 
any. Upon entry to the service routine, R4 contains the byte offset of the logout 
frame from the base of the logout area. If no frame was built, R4 contains -1 
(FFFF FFFF FFFF FFFF),). 


6.6 System Control Block 
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The System Control Block (SCB) specifies the entry points for exception, interrupt, 
and machine check service routines. The block is from 8K to 32K bytes long, must 
be page aligned, and must be physically contiguous. The PFN is specified by the 
value of the System Control Block Base (SCBB) internal register. 


The SCB consists of from 512 to 2048 entries, each 16 bytes long. The first 8 bytes 
of an entry, the vector, specify the virtual address of the service routine associated 
with that entry. The second 8 bytes, the parameter, are an arbitrary quadword value 
to be passed to the service routine. 


The SCB entries are grouped into those for: 
Faults 

Arithmetic traps 

Asynchronous system traps 

Data alignment trap 

Other synchronous traps 

Processor software interrupts 

Processor hardware interrupts 

I/O device interrupts 

Machine checks 


The first 512 entries (offsets 0000 through 1FF0j,,) contain all architecturally defined 
and any statically allocated entries. All remaining SCB entries, if any, are used 
only for those I/O device interrupt vectors that are assigned dynamically by system 
software. It is the responsibility of that software to ensure the consistency of the 
assigned vector and the SCB entry. 
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6.6.1 SCB entries for faults 
The exception handler for a fault executes with the IPL unchanged, in Kernel mode, 
on the Kernel stack. 


Table 6-5: SCB Entries for Faults 


Byte 
offset;, Entry name 


000 -unused- 

010 Floating disabled fault 
020-070 -unused- 

080 Access Control Violation fault 
090 Translation Not Valid fault 
0A0 Fault on Read fault 

OBO Fault on Write fault 

0CO Fault on Execute fault 


OAO-OFO -unused- 


6.6.2 SCB Entries for Arithmetic Traps 
The exception handler for an arithmetic trap executes with the IPL unchanged, in 
Kernel mode, on the Kernel stack. 


Table 6-6: SCB Entries for Arithmetic Traps 


Byte 
offset;, Entry name 


200 Arithmetic Trap 
210-230  -unused- 


6.6.3 SCB Entries for Asynchronous System Traps (ASTs) 
The interrupt handler for an asynchronous system trap executes at IPL 2, in Kernel 
mode, on the Kernel stack. 


Table 6-7: SCB Entries for Asynchronous System Traps 


Byte 
offset;, Entry name 


240 Kernel Mode AST 


250 Executive Mode AST 
260 Supervisor Mode AST 
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Table 6-7 (Cont.): SCB Entries for Asynchronous System Traps 


Byte 
offset;, Entry name 


270 User Mode AST 


6.6.4 SCB Entries for Data Alignment Traps | 
The exception handler for a data alignment trap executes with the IPL anenenees 
in Kernel mode, on the Kernel Stack. 


Table 6-8: SCB Entries for Data Alignment Trap 
Byte | | 

offset;, Entry name 

280 Unaligned_Access 

290-3F0 -unused- 


6.6.5 SCB Entries for other Synchronous Traps 


The exception handler for a synchronous trap, other than those described above, 
executes with the IPL unchanged, in the mode and on the stack indicated below. 
“MostPriv” indicates that the handler executes in either the original mode or the 
new mode, whichever is the most privileged. 


Table 6-9: SCB Entries for Other Synchronous Traps 


Byte 

Offset}, Entry Name Mode 
400 Breakpoint Trap Kernel 
410 Bug Check Trap Kernel 
420 Illegal Instruction Trap Kernel 
430 Illegal Operand Trap Kernel 
440 Generate Software Trap Kernel 
450 -unused- 

460 -unused- 

470 -unused- | 

480 Change Mode to Kernel Kernel 
490 Change Mode to Executive MostPriv 
4A0 Change Mode to Supervisor MostPriv 


4B0 Change Mode to User Current 
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Table 6—9 (Cont.): SCB Entries for Other Synchronous Traps 


Byte 
Offset;, Entry Name Mode 


4C0-4F0 -reserved for Digital- 


6.6.6 SCB Entries for Processor Software Interrupts 


The exception handler for a processor software interrupt executes at the target IPL, 
in Kernel mode, on the Kernel stack. | 


Table 6—10: Entries for Processor Software Interrupts 


Byte 
Offset;, Entry Name Target IPLj9 
500 -unused- 
510 Software interrupt level 1 1 
520 Software interrupt level 2 2 
530 Software interrupt level 3 3 
540 Software interrupt level 4 4 
550 Software interrupt level 5 5 
560 Software interrupt level 6 6 
570 Software interrupt level 7 ‘| 
580 Software interrupt level 8 8 
590 Software interrupt level 9 9 
5A0 Software interrupt level 10 10 
5B0 Software interrupt level 11 11 
5CO Software interrupt level 12 12 
5D0 Software interrupt level 13 13 
5E0 Software interrupt level 14 14 
15 


5F0 Software interrupt level 15 


_ 6.6.7 SCB Entries for Processor Hardware Interrupts 


The interrupt handler for a processor hardware interrupt executes at the target IPL, 
in Kernel mode, on the Kernel stack. 
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Table 6-11: SCB Entries for Processor Hardware Interrupts 


Byte | | a 

Offset;, Entry name Target IPLio 
600 Interval clock interrupt 22 

610 _ Interprocessor interrupt _ 22 

640 Powerfail interrupt | 30 

650 Performance monitor 29 


680-6E0 Reserved - processor specific 
6F0 Passive Release 20-23 


Processor-specific SCB entries include i used by console devices (if any) or other 
peripherals dedicated to system support functions. 
6.6.8 SCB Entries for I/O Device Interrupts 


The interrupt handler for an I/O device interrupt executes at the target IPL, in 
Kernel mode, on the Kernel stack. SCB entries for offsets of 800,¢ through 7FF01¢ 
are reserved for I/O device interrupts. 


6.6.9 SCB Entries for Machine Checks 


The handler for machine checks executes in Kernel mode, on the Kernel stack. The 
handler for system correctable machine checks executes at IPL 20; the handler for 
all other machine checks executes at IPL 31. 


Table 6-12: SCB Entries for Machine Checks 


Byte 

Offset;, Entry Name | Target IPL io 
620 System correct. machine check 20 

630 Processor correct. machine check 31 

660 System machine check 31 

670 Processor machine check 31 


i 
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6.7 PALcode Support 
6.7.1 Stack Writability 


In response to various exceptions, interrupts, and machine checks, PALcode pushes 
information on the Kernel stack. PALcode may write this information without 
first probing to ensure that all such writes to the Kernel stack will succeed. Ifa 
memory management exception occurs while pushing information, PALcode forces 
the processor to enter console I/O mode, and subsequent actions, such as processor 
restart, are taken by the console. The REASON FOR HALT code is “processor halted 
due to kernel-stack-not-valid”. \ See Platform Section, Chapter 4. \ 


6.7.2 Stack Residency 


The User, Supervisor, and Executive stacks for the current process do not need to be 
resident. Software running in Kernel mode can bring in or allocate stack pages as 
TNV faults occur. However, since this activity is taking place in Kernel mode, the 
Kernel stack must be fully resident. | 


The faults TNV, ACV, FOR, and FOW, occurring on Kernel mode references to the 
Kernel stack, are considered serious system failures from which recovery is not 
possible. If any of these faults occur, PALcode forces the processor to enter console I/O 
mode, and subsequent actions, such as processor restart, are taken by the console. 
The REASON FOR HALT code is “processor halted due to kernel-stack-not-valid”. 
\ See Platform Section, Chapter 4. \ 


6.7.3 Stack Alignment 


Stacks may have arbitrary byte alignment, but performance may suffer if at least 
octaword alignment is not maintained by software. 


PALcode creates stack frames in response to exceptions and interrupts. Before doing 
so, the target stack is aligned to a 64-byte boundary by setting the six low bits of the 
target SP to 0000002. The previous value of these bits is stored in the SP_ALIGN 
field of the saved PS in memory, for use by a CALL_PAL REI instruction. 


Software-constructed stack frames must be 64 byte aligned and have SP_ALIGN 
properly set; otherwise, a CALL_PAL REI instruction will take an illegal operand 
trap. 7 


6.7.4 Initiate Exception or Interrupt or Machine Check 


Exceptions and interrupts and machine checks are initiated by PALcode with 
interrupts disabled. When an exception, interrupt, or machine check, is initiated, 
the associated SCB vector is read to determine the address of the service routine. 
PALcode then attempts to push the PC, PS, and R2..R7 onto the target stack. When 
an interrupt (software or hardware but not AST) is initiated, PS<IP> is set to 1 to 
indicate an interrupt is in progress. Additional parameters may be passed in R4 
and R5 on exceptions and machine checks. 


During the attempt to push this information, the exceptions (faults) TNV, ACV, and 
FOW can occur: 
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e If any of ieee faults occur when the target stack is User, Supervisor, or 
Executive, then the fault is taken on the Kernel stack. 


e If any of those faults occur when the target stack is the Kernel stack, PALcode 
forces the processor to enter console I/O mode, and subsequent actions, such as 
processor restart, are taken by the console. The REASON FOR HALT code is 
“processor halted due to kernel-stack-not-valid”. \ See Platform Section, Chapter 


4.\ 


6.7.5 Initiate Exception or Interrupt or Machine Check Model 


check_for_exception_or_interrupt_or_mcheck: 
IF NOT {ready to_initiate_ exception OR 
ready to initiate interrupt OR 
ready to initiate _mcheck} THEN 
BEGIN , 
{fetch next instruction} 
{decode and execute instruction} 
END 
ELSE 
BEGIN 
{wait for instructions in progress to complete} 
! clear interrupt pending 


tmp + 0 


IF {unmaskable mcheck pending} THEN 


BEGIN 
{back up implementation specific state if necessary} 


{attempt correction if appropriate} 

IF {uncorrectable AND MCES<0> = 1} THEN 
{enter console} 

ELSE IF {uncorrectable} THEN 
new mode + Kernel 
new ipl + 31 

! set mcheck error flag 

MCES<0O> + 1 

ELSE IF {reporting enabled} THEN 
new mode +- Kernel 
new ipl <- 31 
MCES<2> <- 1 

END 

END 


ELSE IF {data alignment trap} THEN 
new mode + Kernel 


ELSE IF {synchronous trap} THEN 


CASE {opcode} OF 
{back up implementation specific state if necessary} 


CHME: new_mode + min (PS<CM>, Executive) 
CHMS: new _ mode +- min(PS<CM>, Supervisor) 
CHMU: new_mode +- min(PS<CM>, User) 
otherwise: new _ mode + Kernel 

ENDCASE 
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ELSE IF {maskable uncorrectable mcheck pending and IPL < 31} THEN 
BEGIN 
{back up implementation specific state if necessary} 
IF {MCES<0O> = 1} THEN 
{enter console} 
ELSE 
new_mode + Kernel 
new ipl + 31 
MCES<0> «— 1 ! set mcheck error flag 
END | 
END 


ELSE 
new mode Kernel 


END 


IPR_SP[PS<CM>] «+ SP 
new sp + IPR_SP[new_mode] 


IF {exception pending} THEN 
BEGIN | 
{back up implementation specific state if necessary} 
new ipl + PS<IPL> , 
END 


ELSE IF {interrupt pending} THEN 
new ipl + {interrupt source IPL} 
tmp «- 1 ! set interrupt pending 


ELSE IF {maskable correctable mcheck pending AND 
reporting enabled} THEN 
new ipl -— 20 
MCES<1> < 1 
END 


Save align «+ new_sp<5:0> 
new sp<5:0> + 0 


PUSH(PS OR LEFT SHIFT (save_align,56), old_pc, new_mode) 
PUSH(R7, R6, new_mode) , 

PUSH(R5, R4, new_mode) 

PUSH(R3, R2, new_mode) 


PS<SW> + 0 

PS<CM> +- new_mode 
PS<IP> -— tmp 
PS<IPL> +- new_ipl 
SP + new sp 


IF {memory management fault} THEN 


R4 << VA 
R5 +« MMF 
END 
IF {data alignment trap} THEN 
R4 «+ VA 
R5 + { 0 if read/load 1 if write/store } 


END 
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IF {mcheck or correctable error interrupt} THEN 
IF {logout frame built} 
R4 + logout_area offset 
_ ELSE 
R4 + -1 
END 
END 


IF {arithmetic Trap} THEN 
R4 «<- register write mask 
R5 «+ exception summary 
END 


IF {software interrupt} THEN 
SISR + SISR AND NOT{ 2**{ PRIORITY _ENCODE(SISR) } } 
END 


vector «+ {exception or interrupt or mcheck SCB offset} 


R2 + =(SCBB + vector) 
R3 « =(SCBB + vector + 8) 
PC — R2 


END 
GOTO check for _exception_or_interrupt_or_mcheck 


PROCEDURE PUSH(first, last, mode) 


BEGIN 
IF ACCESS (new_sp - 16, mode) THEN 
BEGIN 
(new sp - 8) + first 
(new sp - 16) + last 
new sp + new_sp - 16 
RETURN 
END 
ELSE 
{initiate ACV, TNV, or FOW fault, or 
Kernel Stack Not Valid restart sequence} 
END 
END 


6.7.6 PALcode Interrupt Arbitration 


The following sections describe the logic for the interrupt conditions produced by the 
specified operation. 


6.7.6.1 Writing the AST Summary Register 


Writing the ASTSR internal processor register (see Section 5.3) requests an AST for 
any of the four processor modes. This may request an AST on a formerly inactive 
level and thus cause an AST interrupt. 


The logic required to check for this condition is: | 


ASTSR<3:0> «— {ASTSR<3:0> AND R16<3:0>} OR R16<7:4> 
IF ASTEN<0> AND ASTSR<0O> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 
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6.7.6.2 Writing the AST Enable Register 


Writing the ASTEN internal processor register (see Section 5.3) enables ASTs for 
any of the four processor modes. This may enable an AST on a formerly inactive 
level and thus cause an AST interrupt. 


The logic required to check for this condition is: 


ASTEN<3:0> «— {ASTEN<3:0> AND R16<3:0>} OR R16<7:4> 
IF ASTEN<O0> AND ASTSR<0O> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


6.7.6.3 Writing the IPL Register 


Writing the IPL internal processor register (see Section 5.3) changes the current 
IPL. This may enable an AST or software interrupt on a formerly inactive level and 
thus cause an AST or software interrupt. 


The logic required to check for this condition is: 
PS<IPL> + R16<4:0> | 
! check for software interrupt at level 2..15 


IF {RIGHT SHIFT({SISR AND FFFC 16 }, PS<IPL> + 1) NE 0} THEN 
{initiate software interrupt at IPL of high bit set in SISR} 


{ check for AST 


IF ASTEN<0O> AND ASTSR<O> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


! check for software interrupt at level l 


IF SISR<1> AND {PS<IPL> EQ 0} THEN 
{initiate software interrupt at IPL 1} 


6.7.6.4 Writing the Software Interrupt Request Register 


Writing the SIRR internal processor register (see Section 5.3) requests a software 
interrupt at one of the fifteen software interrupt levels. This may cause a formerly 
inactive level to cause a software interrupt. 


The logic required to check for this condition is: 


SISR<level> «+ 1 
IF level GT PS<IPL> THEN 
{initiate software interrupt at IPL level} 


6.7.6.5 Return from Exception or interrupt 


- The CALL_PAL REI instruction (see Chapter 2) writes both the Current Mode and 
IPL fields of the PS; see Section 6.2. This may enable a formerly disabled AST or 
software interrupt to occur. 


The logic required to check for this condition is: 
PS «- New PS 


! check for software interrupt at level 2..15 
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IF {RIGHT SHIFT ({SISR AND FFFC 16 }, PS<IPL> + 1) NE 0} THEN 
{initiate software interrupt at IPL of high bit set in SISR} 


{ check for AST 


tmp +-~ NOT LEFT _SHIFT (1110 (bin), PS<CM>) 
IF {{tmp AND ASTEN AND ASTSR}<3:0> NE Q} fae {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


! check for software interrupt at level 1 


IF SISR<1> AND {PS<IPL> EQ 0} THEN 
{initiate software interrupt at IPL 1} 


6.7.6.6 Swap AST Enable 


Swapping the AST enable state for the Current Mode results in writing the ASTEN 
internal processor register (see Section 5.3). This may enable a formerly disabled 
AST to cause an AST interrupt. 


The logic required to check for this condition is: 


RO +- ZEXT (ASTEN<PS<CM>>) 
ASTEN<PS<CM>> «<- R16<0> 


IF ASTEN<PS<CM>> AND ASTSR<PS<CM>> AND {PS<IPL> LT 2} THEN 
{initiate AST interrupt at IPL 2} 


6.7.7 Processor State Transition Table 


Table 6—13 shows the operations that can produce a state transition and the specific 
transition produced. For example, if a processor’s initial state is Supervisor mode, it 
is not possible for the processor to transition to a program halt condition. A processor 
can only transition to program halt from Kernel mode. 


In Table 6—13: 
e REI increases mode or lowers IPL. 


© MTPR changes IPL, or is a CALL_PAL MTPR_ASTSR or CALL_PAL MTPR_ASTEN 
instruction that causes an interrupt request. 


e Exc is a state change caused by an exception. 
¢ Int is a state change caused by an interrupt. 


e Mcheck is a state change caused by a machine check. 
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Table 6-13: Processor State Transitions 
Initial State: 


User 


Supervisor 


Executive | 


Kernel 
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Final State: 
User Super. 
CHMU CHMS 
REI 
REI CHMS 
REI 
REI REI 
REI REI 


Exec. 


CHME 


CHME 


CHME 


REI 


REI 


Kernel 


CHMK 
Exc 

Int 
Mcheck 
SWASTEN 


CHMK 
Exc — 

Int 
Mcheck 
SWASTEN 


CHMK 
Exe 

Int 
Mcheck 
SWASTEN 


CHMK 
REI 

Int 

Exc 
Mcheck 
MTPR 
SWASTEN 


Program 


Halt 


Not 
Possible 


Not 
Possible 


Not Possible 


HALT 





6 8 \REVISION HISTORY 
Revision 5.0, May 12, 1992 


1. 


2. 


oC ON DT eR OO 


10. 
11. 


Removed intr_flag and lock _fiag from initiate excep inter mcheck model 


Added eco #45—correctable errors (machine checks), performance monitor, and 
passive release information | 


Conditionalized references to platform section 

Widget —> device — 

Reordered and combined sections to consolidate information 

Added eco #30, #44 (DATFX) also eco #29 (GENTRAP) 

Corrected init exception model for eco 25 PS(IP) bit and eco 23 (timer) 
DRAINT to TRAPB 

Converted to SDML | 

Added ECO #18, #23 (removed AT references), #25 

Integrate references for Console ECO #15 


Revision 4.0, March 29, 1991 


o 7 fF 2S LY 


10. 
11; 


12. 
13. 


On Memory Management Faults, R4 now contains the exact faulting address 
Removed references to D_float 

Typos 

Note reason for unaligned load locked and store conditional vectors 

Correct reference from AST Request Register to AST Summary Register 


Correct pointer to location of physical address of error logout area from R2 to R4 
in Processor Machine Check Abort section 


Correct two references from Corrected Error logout area to Machine Check logout 
area 


Change name of ‘instruction issue model’ to ‘initiate exception or interrupt model’ 


Swap order of data alignment trap and synchronous trap code fragments in 
initiate exception or interrupt model 


Correct which bits are loaded (=<4:0>) from R16 into IPR IPL by MTPR IPL 


Add REI* and CHMx to each entry along the main diagonal of the Processor 
State Transition table 


Describe machine check logout area as reserved for PALcode and console use 


Add R2..R7 to values restored by REI in ’Stack Frames’ text 
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14. 


15. 


16. 


17. 
18. 


19. 


20. 


21. 


22. 
23. 


27. 


28. 


29. 
30. 
31. 


Modify logic statement for Swap AST Enable so that it reflects CALL_PAL 
SWASTEN instruction action 


Remove references to ASTs as ‘interrupts’, substituting ‘exception’ where 
appropriate 


Move and modify reference to ASTs in last bullet item of section Exceptions to 
section Asynchronous System Trap 


Define meaning of ‘trigger instruction’ in text of arithmetic trap description 


Change values defined in R5 for memory management faults to full quadword 
values 


Modify initiate exception or interrupt model to show bit corresponding to software 
interrupt being dispatched to is cleared before the dispatch 


Modified tense of description of saved PC for arithmetic trap from ’would have 
issued’ to ’would have been issued’ 


Move power-fail text at end of section Interprocessor Interrupt Request Register 


to end of subtext of section Interrupts 
Clarify reference to ’RA’ in initiate exception or interrupt pseudocode 


Change vector — {exception ..}’ to vector — {exception or interrupt ..}’ in initiate 
exception or interrupt pseudocode 


. Note that there are four per-mode SCB vectors for ASTs 
25. 
26. 


Add entry for Software Interrupts to table Exceptions and Interrupts Summary 


Restrict the class of instructions that are described as taking Invalid Operation 
traps on non-finite values 


Clarify that, following a memory management fault, R4 contains an address 
within the implementation-dependent-sized page that contains the faulting 
address 


Reorganize the sections on synchronous traps (starting around current 
section $$section(synchr_trap)) to eliminate references to ASTs under Other 
Synchronous Traps category 


Elaborate Interval Clock Interrupt description 
Changed Load and Store D to G in SCB entries table for Alignment Traps 


Moved ’perf. monitor’ from Asynchronous Traps to Hardware Interrupts 


Revision 3.0, March 2, 1990 


1. 


2 
3. 
4 


Get PS/PC in correct order in stack frames 
Restructure stack frames and R2..R7 


Increase stack frame alignment to 64 byte 


. Restructure SCB 
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5. Change some faults to synchronous traps 2 
6. Redo and simplify arithmetic traps 
7. Rework AST delivery to match VAX 
8. Specify writeback cache behavior at powerfail 
9. Remove IPL from Processor State Transition Table 


10. Remove Privileged instruction Trap 


Revision 2.0, October 4, 1989 

Remove interrupt stack 

Remove kernel stack not valid abort 

Remove stack alignment requirement, add PS<SP_ALIGN> 
Remove ICIE and IPIE interrupt enables | 

Remove FREEZE of PC 

Remove references to WAIT 

Add DRAINT and DRAINA 

Delete operand faults 


o OD NO F OY DYN 


Make data alignment fault stay in current mode 
10. Simplify floating exceptions 
Revision 1.0, May 23, 1989 


1, First review distribution. 
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DEC OSF/1 Alpha Software (III) 


This section describes how DEC OSF/1 operating system relates to the Alpha architecture, 
and includes the following chapters: 


¢ Chapter 1, Introduction to DEC OSF/1 Alpha (ITI) 

¢ Chapter 2, OSF/1 PALcode Instruction Descriptions (ITT) 
e Chapter 3, OSF/1 Memory Management (III) 

¢ Chapter 4, OSF/1 Process Structure (ITI) 

¢ Chapter 5, OSF/1 Exceptions and Interrupts (III) 
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Chapter 1 


Introduction to DEC OSF/1 Alpha (III) 


The goals of this design are to provide a hardware implementation independent 
interface between the hardware and DEC OSF/1 Alpha. The interface needs to 
provide the needed abstractions to minimize the impact of different hardware 
implementations on the operating system. The interface also needs to be low in 
overhead to support high-performance systems. Lastly the interface needs to only 
support the features used by DEC OSF/1 Alpha. 


The register usage in this interface is based on the current calling standard used by 
DEC OSF/1 Alpha. If the calling standard changes, this interface will be changed 
to reflect that. The current calling standard register usage is shown in Table 1-1. 


Table 1-1: DEC OSF/1 Alpha Register Usage 


Register 
Name 


r0 


r1..r8 
r9..r14 


r15 
r16..r21 


r22..r25 
126 
r27 
r28 


129 
r30 
r3l1 


Software 


Name 


vO 


t0..t7 
s0..s5 


FP or s6 
a0..a5 


—-+8.t11 


ra | 
pv or t12 


at 


GP 
SP 


zero 


Use and 
linkage 


Used for expression evaluations and to hold integer function 
results. 
Temporary registers; not preserved across procedure calls. 


Saved registers; their values must be preserved across 
procedure calls. 


. Frame pointer or a saved register. 


Argument registers; used to pass the first 6 integer type 
arguments; their values are not preserved across procedure 
calls. 


Temporary registers; not preserved across procedure calls. 
Contains the return address; used for expression evaluation. 
Procedure value or a temporary register. 


Assembler temporary register; not preserved across procedur 
calls. | 


Global pointer. 
Stack pointer. 


_ Always has the value 0. 
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141 Programming Model 


The programming model of the sascha: is the combination of the state visible aiphar | 
_ directly via instructions, or indirectly via actions of the machine. The following four — 
_ tables define constants, state variables, terms, and subroutines used in the rest of 


the document. 


1.1 -1 Code Flow Constants 


Table 1—2: cece Flow Gonsianis 


Term 
IPL = 2:0 


= 2: The range 2:0 used in the PS to access the IPL field of the PS 


maxCPU 
mode = 3 
pageSize 


vaSize 


Meaning and value 


(PS<IPL>). 

The maximum number of processors in a given seater 

Used as a subscript in PS to select current mode (PS<mode>). 
Size of a page in an implementation in bytes. 


Size of virtual address in bits in a given implementation. 


1.1.2 Machine State Terms 


Table 1-3: Machine State Terms 


Term 


Meaning 


An implementation-dependent size register to hold the current 


ASN 


entArith<63:0> 
entIF<63:0> 
entInt<63:0> 


entMM<63:0> 


address space number (ASN). The size and existence of ASN is an 
implementation choice. 


The arithmetic trap entry address register. The entArith is an 


internal processor register that holds the dispatch address on an 


arithmetic trap. There can be a hardware register for the entArith 
or the PALcode can use private scratch memory. 


The instruction fault entry address register. The entIF is an internal 
processor register that holds the dispatch address on an instruction 
fault. There can be a hardware register for the entIF or the PALcode 
can use private scratch memory. 


The interrupt entry address register. The entInt is an internal 
processor register that holds the dispatch address on an interrupt. 
There can be a hardware register for the entInt or the PALcode can 
use private scratch memory. 


The memory-management fault entry address register. The entMM 
is an internal processor register that holds the dispatch address on 
a memory-management fault. There can be a hardware register for 
the entMM or the PALcode can use private scratch memory. 
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Table 1-3 (Cont): 


Term 


entSys<63 :0> 
entUna<63:0> 
FEN<0> 


instruction<31:0> 
intr_flag 


KGP<63:0> 
KSP<63:0> 
lock_flag<0> 


PC<63:0> 


PCB 
PCBB<63:0> 


PS<3:0> 


PTBR<63:0> 


Machine State Terms 
Meaning 


The system call entry address register. The entSys is an internal 


processor register that holds the dispatch address on an callsys 


instruction. There can be a hardware register for the entSys or the 


--PALcode can use private scratch memory. 


The unaligned fault entry address register. The entUna is an internal 


_ processor register that holds the dispatch address on an unaligned 


fault. There can be a hardware register for the entUna or the PALcode 
can use private scratch memory. 


The floating-point enable register. The FEN is a one-bit register that 
is used to enable or disable floating-point instructions. If a floating- 
point instruction is.executed with FEN equal to zero, a FEN fault is 
initiated. 

The current instruction being executed. This is a fake register used 
in the flows to CASE on different instructions. 


A per-processor state bit. The intr_flag bit is cleared if that processor 
executes an rti or retsys instruction. 


The kernel global pointer. The KGP is an internal processor register 
that holds the kernel global pointer that is loaded into R15, the GP, 
when an exception is initiated. There can be a hardware register for 
the KGP or the PALcode can use private scratch memory. 


The kernel stack pointer. The KSP is an internal processor register 
that holds the kernel stack pointer while in user mode. There can be 
a hardware register for the KSP or the storage space in the PCB can 
be used. 


A one-bit register that is used by the load locked and store conditional 
instructions. 


The program counter. The PC is a pointer to the next instruction in 
the flows. The low-order two bits of the PC always read as zero and 
writes to them are ignored. 


The process control block. The PCB holds the state of the process. 


The process control block base address register. The PCBB holds the 
address of the PCB for the current process. 


The processor status. The PS is a four-bit register that stores the 
current mode in bit <3> and stores the three-bit IPL in bits <2:0>. 
The mode is 0 for kernel and 1 for user. 


The page table base register. The PTBR contains the physical page 
frame number (PFN) of the highest level (level 1) page table. 
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Table 1-3 (Cont.): 
Term 


SP<63:0> _ 


sysvalue<63:0> 


unique<63:0> 


USP<63:0> 


VPTPTR<63:0> 


whami<63:0> 


1.1.3 Code Flow Terms 


Machine State Terms 
Meaning 


Another name for R30. The SP points to the top of the current stack. 


PALcode only accesses the kernel stack. The kernel stack must 
be quadword aligned whenever PALcode reads or writes it. If the 
PALcode accesses the kernel stack and the stack is not aligned, a 
kernel-stack-not-valid halt is initiated. Although PALcode does not 
access the user stack, that stack should also be at least quadword - 
aligned for best performance. 


The system value register. The sysvalue holds the per-processor 
unique value. There can be a hardware register for the sysvalue 
register or the storage space in the PALcode scratch memory can be 
used. 


The sysvalue register can fie be accessed by Rernal mode code and 
there is one sysvalue register per CPU. 7 


The process unique value register. The unique vopibies holds the 
per-process unique value. There can be a hardware register for the 
unique register or the storage space in the PCB can be used. 


The unique register can be accessed by both user and kernel code and 
there is one unique register per process. 


The user stack pointer. The USP is an internal processor register 
that holds the user stack pointer while in kernel mode. There can be 
a hardware register for the USP or the storage space in the PCB can 
be used. 


The virtual page table pointer. The VPTPTR holds the virtual address 
of the first level page table. 


The processor number of the current processor. This number is in the 
range 0..maxCPU—-1. 


Table 1-4: Code Flow Terms 


Term 


opDec 


Meaning — 


An attempt was made to execute a reserved instruction or execute a 
privileged instruction in user mode. 


1.2 \Revision History 
Revision 1.0, May 12, 1992 


e First review distribution 
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2.1 Unprivileged PALcode Instructions 


Table 2-1 lists the OSF/1 PALcode unprivileged instruction mnemonics, names, and 
the environment from which they can be called: 


Table 2—1: Unprivileged OSF/1 PALcode Instructions 


Mnemonic Name Calling environment 

bpt Breakpoint trap Kernel and user modes 

bugchk Bugcheck trap Kernel and user modes 

callsys System call _ | User mode 

gentrap Generate trap _ Kernel and user modes 

imb I-Stream memory barrier Kernel and user modes 
Described in Common Architecture, Chap- 
ter 6 

rdunique Read unique _ Kernel and user modes 

wrunique Write unique Kernel and user modes 
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2.4.1 Breakpoint Trap 
: ; Format: | : 

bets st 1 PALeode format 
Operation: 


temp +«- PS _ | 
if (ps<mode> NE 0) then 


' USP + SP _! Mode is user so switch to kernel 
SP + KSP 7 | 
PS + 0 


endif 

SP -— SP - {6 * 8} 
(SP+00) <-— temp 
(SP+08) +- PC 
(SP+16) «+ GP 


(SP+24) + a0 
(SP+32) «- al 
(SP+40) — a2 
aQ «+ 0 

GP +«—- KGP 


PC < entIF 


Exceptions: 


Kernel stack not valid 


Mnemonics: 


bpt Breakpoint trap 


Description: 


The breakpoint trap (bpt) instruction switches mode to kernel, builds a stackframe 
on the kernel stack, loads the GP with the KGP, loads a value of 0 into a0, and 
dispatches to the breakpoint code pointed to by the entIF register. The registers 
al..a2 are UNPREDICTABLE on entry to the trap handler. The saved PC at (SP+08) 
is the address of the instruction following the trap instruction that caused the trap. 


Notes: 


¢ The opcode and function code for the bpt instruction are the same in the 
OpenVMS and the OSF/1 PALcode. | 
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2.1.2 Bugcheck Trap 
Format: 
baa | | ! PALcode format | 
Operation: 


temp «- PS 
if (PS<mode> NE 0) then 


USP «+ SP ! Mode is user so switch to kernel 
SP + #£=KSP i 8 
PS + 0 

endif 

SP — SP - {6 * 8} 

-(SP+00) «— temp 

(SP+08) <— PC 

(SP+16) «<- GP 

(SP+24) <— a0 

(SP+32) «-— al 

(SP+40) < a2 

aQ «+ 1 

GP + KGP 


PC = entIF 


Exceptions: 


Kernel stack not valid 


Mnemonics: 


bugchk Bugcheck trap | 


Description: 


The bugcheck trap (bugchk) instruction switches mode to kernel, builds a stackframe 
on the kernel stack, loads the GP with the KGP, loads a value of 1 into a0, and 
dispatches to the breakpoint code pointed to by the entIF register. The registers 
al..a2 are UNPREDICTABLE on entry to the trap handler. The saved PC at (SP+08) 
is the address of the instruction following the trap instruction that caused the trap. 


Notes: — 


e The opcode and function code for the bugchk instruction are the same in the 
OpenVMS and the OSF/1 PALcode. 
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2.1.3 System Call 


Format: | 
- callsys | | PALcode format _ 
Operation: 
if (PS<mode> EQ 0) then 
~machineCheck > 
endif | 
USP + SP 
SP + KSP | | | 
PS + 0O | { Mode=kernel 
SP t=  .SP°= {6768} | 7 4 
(SP+00) < 8 ! PS of mode-user, IPL=0 


(SP+08) +- PC 
(SP+08) +- GP 
GP + KGP _ 
PC + entSys 


Exceptions: 


Machine check—invalid kernel mode callsys 
Kernel stack not valid 


Mnemonics: 
callsys System call 


Description: 


The system call (callsys) instruction is saucer only from user mode. (Issuing a 
callsys from kernel mode causes a machine check exception). 


The callsys instruction switches mode to kernel and builds a callsys stack frame. 
The GP is loaded with the KGP. The exception then dispatches to the system call 
code pointed to by the entsys register. On entry to the callsys code, the scratch 
registers t8..t11 are UNPREDICTABLE. | : 
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2.1.4 Generate Trap 


Format: 


gentrap : 7 ! PALcode format 


Operation: 


temp «- PS | 
if (PS<mode> NE 0) then 


USP + SP ! Mode is user so switch to kernel 
SP + #£KSP 
PS + 0 
endif | 
SP —- SP - {6 * 8} 
(SP+00) <« temp 
(SP+08) «+ PC 
(SP+16) «<- GP 
(SP+24) «<— a0 
(SP+32) <« al 
(SP+40) «+ a2 
aQ - 2 
GP «+ KGP 


PC « entIF 


Exceptions: 


Kernel stack not valid 


Mnemonics: 


 gentrap .Generate trap 


Description: 


The generate trap (gentrap) instruction switches mode to kernel, builds a stackframe 
on the kernel] stack, loads the GP with the KGP, loads a value of 2 into a0, and 
dispatches to the breakpoint code pointed to by the entIF register. The registers 
al..a2 are UNPREDICTABLE on entry to the trap handler. The saved PC at (SP+08) | 
is the address of the instruction following the trap instruction that caused the trap. | 


Notes: 


e The opcode and function code for the gentrap instruction are the same in the 


OpenVMS and the OSF/1 PALcode. 
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2.1.5 Read Unique Value 


“Format: 
- rdunique — —— | - Tf PALcode format 

Operation: 

Oo eeteae | 
Exceptions: 

None 
Mnemonics: 

rdunique Read unique value 


Description: 


The read unique value (rdunique) instruction returns the process unique value in 
v0. The write unique value (wrunique) instruction, described in Section 2.1.6, sets 
the process unique value register. 


‘Notes: 


e The opcode and function ebde for the rdunique instruction are the same in the 
OpenVMS and the OSF/1 PALcode. 
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2.1.6 Write Unique Value 
- Format: 
wrunique : | ! PALcode feat 
Operation: 
a “unique — al 
_ Exceptions: 
None 
Mnemonics: 
wrunique Write unique value 


Description: 


The write unique value (wrunique) instruction sets the process unique register to 
the value passed in a0. The read unique value (rdunique) instruction, described in 
Section 2.1.5, returns the process unique value. 


Notes: 


¢ The opcode and function code for the wrunique instruction are the same in the 
OpenVMS and the OSF/1 PALcode. 
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2.2 Privileged OSF/1 PALcode Instructions 


The Privileged OSF/1 PALcode instructions provide an ab stracted interface to control 
the privileged state of the machine. : | 


Table 2-2: Privileged OSF/1 PALcode Instructions 


Mnemonic Name 
halt Halt the Processor 
Described in Common Aeon een Chapies 6 
rdps Read processor status — 
rdusp, Read user stack pointer - 
rdval Read system value 
retsys Return from system call 
rti Return from trap, fault, or interrupt 
swpctx Swap process context | 
swpipl Swap IPL 
thi TB (translation buffer) invalidate 
whami Who am I | 
wrent. Write system entry address 
wrfen Write floating-point enable 
wrkgp Write kernal global pointer 


wrvptptr Write virtual page table pointer 
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2.2.1 Read Processor Status 
Format: 
rdps | ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif | 

v0 «- PS 


Exceptions: 
_Opcode reserved to Digital 


Mnemonics: 
rdps _. Read processor status 
Description: 


The read processor status (rdps) instruction returns the PS in v0. On return from 
the rdps instruction, registers tO and t8..t11 are UNPREDICTABLE. 
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2.2.2 Read User Stack Pointer | 


_ Format: | 
-rdusp | | -  t PALcode format 


Operation: 


1f (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 «+ USP 


Exceptions: 


Opcode reserved to Digital 


Mnemonics: 
rdusp Read user stack pointer 


Description: 


The read user stack pointer (rdusp) instruction returns the user stack pointer 
in vO. The user stack pointer is written by the wrusp instruction, described in 
Section 2.2.13. On return from the rdusp instruction, registers tO and t8..t11 are 
UNPREDICTABLE. 
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2.2.3 Read System Value 


Format: 
rdval !PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 + sysvalue 


Exceptions: 


Opcode reserved to Digital 


Mnemonics: 


rdval Read system value 


Description: 


The read system value (rdval) instruction returns the sysvalue in vO, allowing access 
to a 64-bit per-processor value for use by the operating system. On return from the 
rdval instruction, registers tO and t8..t11 are UNPREDICTABLE. 
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2.2.4 Return From System Call 


Format: 


retsys : ! PALcode format 


Operation: 


if {PS<mode> EQ 1} then 
{Initiate opDec fault} 
endif | 
tmp + (SP+08) 
GP + (SP+16) 
KSP «- SP + {6*8} 
SP + USP 
intr flag = 0 ! Clear the interrupt flag 
lock_flag = 0 -! Clear the load lock flag 
PS + 8 ! Mode=user | 
PC + tmp 


Exceptions: 


Opcode reserved to Digital 
Kernel stack not valid (halt) 


Mnemonics: 
retsys Return from system call 


Description: 


The return from system call (retsys) instruction pops the return address and the user 
mode global pointer from the kernel stack. It then saves the kernel stack pointer, 
sets the mode to user, sets the IPL to zero, and enters the user mode code at the 
address popped off the stack. 
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2.2.5 Return From Trap, Fault or Interrupt 


Format: 


rti ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

tempps + (SP+0) 

temppce + (SP+8) 

GP — (SP+16) | 


a0 + £(SP+24) 
al « (SP+32) 
a2 -— (SP+40) 
SP -— SP + {6 * 8} 
if { tempps<3> EQ 1} then 
KSP + SP ! New mode is user 
SP «+ USP 
tempps + 8 
endif 
intr flag = 0 ! Clear the interrupt flag 
lock flag = 0 { Clear the load lock flag 
PS + tempps<3:0> ! Set new PS 


PC + temppc 


Exceptions: 


Opcode reserved to Digital 
Kernel stack not valid (halt) 


Mnemonics: 
rti Return from trap, fault, or interrupt 


Description: 


The return from fault, trap, or interrupt (rti) instruction pops registers (a0..a3, and 
GP), the PC, and the PS, from the kernel stack. If the new mode is user, the kernel 
stack is saved and the user stack is restored. 
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2.2.6 Swap Process Context 


Format: 
swpctx | ! PALcode format 


Operation: 


if (PS<mode> EQ 1) 
{Initiate opDec fault} 
endif 
(PCBB) «< SP ! Save current state 
(PCBB+8) +- USP | 
tmp «<- PCC 
tmpl + tmp<31:0> + tmp<63:32> 
(PCBB+24)<31:0> -— tmp1<31:0> 


v0 — PCBB t Return old PCBB 
PCBB «<— a0 ! Switch PCBB 
SP «+ (PCBB) | t Restore new state 


USP -— #(PCBB+8) 
oldPTBR + PTBR 
PTBR «< (PCBB+16) 
tmpl +- (PCBB+24) 
PCC<63:32> «— {tmpl - tmp}<31:0> 
FEN <— (PCBB+40) | 
if {process unique register implemented} then 
(vO04+32) + unique 
unique + (PCBB+32) 
endif 
if {ASN implemented} 
ASN «- tmp1<63:32> 


else , 
if (oldPTBR NE PTBR) 
{Invalidate all TB entries with ASM=0} 
endif 
endif 
Exceptions: 


Opcode reserved to Digital 


Mnemonics: 
swpctx Swap process context 
Description: 


The swap process context (swpctx) instruction saves the current process data in 
the current PCB. Then swpctx switches to the PCB passed in a0 and ioads the 
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new process context. The old PCBB is returned in vO. On return from the swpctx 
instruction, registers tO, t8..t11, and a0 are UNPREDICTABLE. | 
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2.2.7 Swap IPL 


Format: 
swpipl _ | | ! PALcode format 
Operation: 
if (PS<mode> EQ 1) then 
{Initiate opDec fault} 
endif - 


v0 + PS<IPL> 
PS<IPL> — a0Q<2:0> 


Exceptions: 

Opcode reserved to Digital 
Mnemonics: 

swpipl Swap IPL 


Description: 


The swap IPL (swpipl) instruction returns the current value of the PS<IPL> bits in 
vO and sets the IPL to the value passed in a0. On return from the spwipl instruction, 
registers t0, t8..t11, and a0 are UNPREDICTABLE. 
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2.2.8 TB Invalidate 


Format: 


thi | ! PALcode format 


Operation: | 


if (PS<mode> EQ 1) then 
{Initiate ORDES fault} 
endif 
case a0 begin 
1: ! tbisi 
{Invalidate ITB entry for vanes) 
break; 
2: ! tbhisd 
{Invalidate DTB entry for va=al} 
break; 
af d- this 
{Invalidate both ITB and DTB entry for acer) 
break; 
-1l: ! tbiap 
{Invalidate all TB entries with ASM=0} 
break; 
-2: ! tbia 
{Flush all TBs} 
break; 
otherwise: 
break; 
endcase | 


Exceptions: 
Opcode reserved to Digital 
Mnemonics: | 
_ thi TB (translation buffer) invalidate 


Description: 


The TB invalidate (tbi) instruction removes specified entries from the I and D 
translation buffers (TBs) when the mapping changes. The tbi instruction removes 
specific entry types based on a CASE selection of the value passed in register 
a0. On return from the thi instruction, registers tO, t8..t11, a0, and al are 


UNPREDICTABLE. 
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2.2.9 Who AmI 
Format: | 
whami | ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

v0 — whami 


Exceptions: 


Opcode reserved to Digital 


Mnemonics: 


whami Who am I 


Description: 


The who am I (whami) instruction returns the processor number for the current 
processor in v0. The processor number is in the range 0 to the number of processors 
minus one (0..maxCPU-—1) that can be configued in the system. On return from the 
whami instruction, registers tO and t8..t11 are UNPREDICTABLE. 
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2.2.10 Write System Entry Address 


Format: 


wrent — | | ! PALcode format 


Operation: 


1f (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

case al begin 

0: ! Write the EntInt: 
entInt «+ a0 
break; 

1: ! Write the EntArith: 
entArith + a0 
break; 

2: ! Write the EntMM: 
entMM <- aQ 
break; 

3: ! Write the EntIF: 
entIF <«- a0 
break; 

4: ! Write the EntUna: 
entUna «+ a0 
break; 

5: ! Write the EntSys: 
entSys + a0 
break; 

otherwise: 

| break; 

endcase; 


Exceptions: 
Opcode reserved to Digital 


Mnemonics: 


wrent Write system entry address 


Description: 


The write system entry address (wrent) instruction determines the specific system 
entry point based on a CASE selection of the value passed in register al. The wrent 
instruction then sets the virtual address of the areuees system entry point to the 


value passed in a0. 
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For best Sarid aii all the addresses should kseg addresses. (Gee Chapter 3 
for a definition of kseg addresses). 


On return from the wrent instruction, registers 0, £8. 11, a0, nad al are 
_ UNPREDICTABLE. | 
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2.2.11 Write Floating-Point Enable 


Format: 


wrfen 7 ! PALcode format 


Operation: 


1f (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

FEN -— a0<0> 

(PCBB+40) «+ a0 AND 1 


Exceptions: 


Opcode reserved to Digital 


Mnemonics: 


wrfen Write floating-point enable 


Description: 


The write floating-point enable (wrfen) instruction writes bit zero of the value passed 
in a0 to the floating-point enable register. The wrfen instruction also writes the value 
for FEN to the PCB at offset (PCBB+40). On return from the wrfen instruction, 
registers t0, t8..t11, and a0 are UNPREDICTABLE. 
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2.2.12 Write Kernel Global Pointer 


Format: _ 
wrkgp | _! PALcode format 


Operation: 


if (PS<mode> EQ 1) then © 
{Initiate opDec fault} 

endif — 

KGP «+ a0 


Exceptions: 
Opcode reserved to Digital 
Mnemonics: 
wrkgp Write kernal global pointer 


Description: 


The write kernel global pointer (wrkgp) instruction writes the value passed i in a0 to 
the kernel global pointer (KGP) internal register. The KGP is used to load the GP 
on exceptions. On return from the wrkgp instruction, registers t0, t8. #11, and aQ 
are UNPREDICTABLE. 
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2.2.13 Write User Stack Pointer 


Format: 
wrusp | ! PALcode format | 
Operation: 
1£ (PS<mode> EQ 1) then 
{Initiate opDec fault} 


endif: 
USP «+ a0 


Exceptions: 

Opcode reserved to Digital 
Mnemonics: 

wrusp Write user stack pointer 


Description: 


The write user stack pointer (wrusp) instruction writes the value passed in a0 to the 


user stack pointer. On return from the wrusp instruction, registers tO, t8..t11, and 
aQ are UNPREDICTABLE. 
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2.2.14 Write System Value 


Format: 


wrval : !PALcode format 


Operation: 
if (PS<mode> EQ 1) then 
| {Initiate opDec fault} 


endif 
sysvalue + a0 


Exceptions: 


Opcode reserved to Digital 


Mnemonics: 


wrval Write system value 


Description: 


The write system value (wrval) instruction writes the value passed in a0 to a 64- 
bit system value register. The combination of wrval with the rdval instruction, 
described in Section 2.2.3, allows access by the operating system to a 64-bit per- 
processor value. On return from the wrval instruction, registers t0, t8..t11, and a0 
are UNPREDICTABLE. | 
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2.2.15 Write Virtual Page Table Pointer 


Format: 
wrvptptr ! PALcode format 


Operation: 


if (PS<mode> EQ 1) then 
{Initiate opDec fault} 

endif 

VPTPTR «+ a0 


Exceptions: 


Opcode reserved to Digital 


Mnemonics: 


wrvptptr Write virtual page table pointer 


Description: | 


The write virtual page table pointer (wrvptptr) instruction writes the pointer passed 
in aQ to the virtual page table pointer register (VPTPTR). The VPTPTR is described 
in Chapter 3. On return from the wrvptptr instruction, registers t0, t8..t11, and a0 
are UNPREDICTABLE. | | 
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_ Revision 1.0, May 12, 1992 


e First review distribution 
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| Chapter 3 
OSF/1 Memory Management (III) 


3.1 Introduction 


3.2 Virtual Address Spaces 


A virtual address is a 64-bit unsigned integer that specifies a byte location within the 
virtual address space. Implementations subset the supported address space to one 
of four sizes (43, 47, 51, or 55 bits) as a function of page size. The minimal supported 
virtual address size is 43 bits. If an implementation supports less than 64-bit virtual 
addresses, it must check that all the VA<63:vaSize> bits are equal to VA<vaSize-1>.. 
This gives two disjoint ranges for valid virtual addresses. For example, for a 
43-bit virtual address space, valid virtual address ranges are 0..3FFFFFFFFFF 1. 
and FFFFFC0000000000,,.. FFFFFFFFFFFFFFFF),. Access to virtual addresses 
outside of an implementation’s valid virtual address range cause an access-violation 
fault. 


The virtual address space is divided into 3 segments. The two bits 
va<vaSize—1:vaSize—2> select a segment as shown in Table 3-1. 


Table 3-1: Virtual Address Space Segments 


VA<vaSize—I1:vaSize—2> Name Mapping Access Control 
Ox segQ Mapped via TB Programed in PTE 
10 kseg -PA+ sext(VA<vaSize—3:0>) Kernel Read/Write 


11 | segl | Mapped via TB Programed in PTE 


For kseg, the relocation, sharing, and protection are fixed. For segO and segl, the 
virtual address space is broken into pages, which are the units of relocation, sharing, 
and protection. The page size ranges from 8 Kbytes to 64 Kbytes. Therefore, system 
software should allocate regions with differing protection on 64 Kbyte virtual address 
boundaries to ensure image compatibility across all Alpha implementations. 


Memory management provides the mechanism to map the active part of the virtual 
address space to the available physical address space. The operating system controls 
the virtual-to-physical address mapping tables and saves the inactive (but used) 
parts of the virtual address space on external storage media. 
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3.2.1 Segment Seg0 and Seg’ Virtual Address Format 


The processor generates a 64-bit virtual address for each instruction and operand in 
memory. A seg0 or seg] virtual address consists of three vena fields and a 
byte_within_page field, as shown in Figure 3—1. 


Figure 3~1: Virtual Address Format 





The byte_within_page field can be either 13, 14, 15, or 16 bits depending on a 
particular implementation. Thus, the allowable page sizes are 8 Kbytes, 16 Kbytes, | 
32 Kbytes, and 64 Kbytes. Each level-number field is 0-n bits long, where, for 
example, n is 9 for an 8K page size. Level-number fields are the same size for a 
given implementation. 


The level-number fields are a function of the page size; all page table entries at any 
given level do not exceed one page. The PFN field in the PTE is always 32 bits wide. 
Thus as the page size grows the virtual and physical address size also grows. 


In Table 3-2, the physical address column is the maximum physical address 
supported by the smaller of seg0/seg1 or kseg, as indicated. 


Table 3—2: Virtual Address Options 


Page Byte Level Virtual Physical Physical 
Size Offset Size Address Address Address 
(bytes) (bits) (bits) (bits) (bits) Limited by 
8K 13 10 43 41 kseg 

16K 14 | 11 47 45 kseg 

32K 15 12 51 417 seg0/seg1 
64K 16 13 55 48 seg0/seg1 


3.2.2 Kseg Virtual Address Format 


The processor generates a 64-bit virtual address for each instruction and operand 
in memory. A kseg virtual address consists of segment select field with a value 
of 10. and a physical address field. The segment select field is the two bits 
va<vaSize—1:vaSize—2>. The physical address field is va<vaSize—3:0>. 
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Figure 3-2: Kseg Virtual Address Format 


63 0 


SEXT (segment_select<1>) Segment Select=10, Physical Address | 


Figure 3-3: Page Table Entry (PTE) 


63 32 31 





3.3 Physical Address Space 


Physical addresses are at most vaSize—2 bits. This allows all of physical memory 
to be accessed via kseg. A processor may choose to implement a smaller physical 
address space by not implementing some number of high order bits. The two 
most significant implemented physical address bits select a caching policy or 
implementation dependent type of address space. Implementations will use these 
bits as appropriate for their systems. For example, in a workstation with a 30-bit 
physical address space, bit<29> might select between memory and non-memory like 
regions, and bit <28> could enable or disable caching; see Common Architecture, 
Chapter 5. 


3.4 Memory Management Control 


Memory management is always enabled. Implementations must provide an 
environment for PALcode to service exceptions and to initialize and boot the 
processor. For example PALcode might run with I-stream mapping disabled. 


3.5 Page Table Entries 


The processor uses a quadword page table entry (PTE) to translate segQ and seg] 
virtual addresses to physical addresses. A PTE contains hardware and software 
control information and the physical page frame number (PFN). A PTE is a quadword 
with the following fields: 
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Bits 
63:32 


31:16 
15:14 
13 


12 


11:10 


Name. 


PFN | 


SW 


~ RSVO 


RSV1 


RSV2 
GH 


Table 3-3: Page Table Entry (PTE) Bit Summary 


Meaning 


Page frame number 


The PFN field always points to a page ieiiaee If V is set, the PFN 
is concatenated with the byte_within_page bits of the virtual address to | 
obtain the physical address. | 


Reserved for software. 
Reserved for hardware: SBZ. 


User write enable. 


This bit enables writes from user mode. If this bit is 0 and a store is 


attempted while in user mode, an access-violation fault occurs. This bit. 
is valid even when V=0. 


Kernel write enable. 


This bit enables writes from kernel mode. If this bit is 0 and a store is 
attempted while in kernel mode, an access-violation fault occurs. This 
bit is valid even when V=0. 


Reserved for hardware; SBZ. 


User read enable. 


This bit enables reads from user mode. If this bit is 0 and a load or 
instruction fetch is attempted while in user mode, an Access Violation 
occurs. This bit is valid even when V=0. 


Kernel read enable. 


This bit enables reads from kernel mode. If this bit is 0 and a load or 
instruction fetch is attempted while in kernel mode, an access-violation 
fault occurs. This bit is valid even when V=0. 


Reserved for hardware; SBZ. 
Granularity hint. 


Software may set these bits to a non-zero value to supply a hint to 
translation buffer implementations that a block of pages can be treated 
as a single larger page: 


1. A block is an aligned group of 8**N pages where N is the value of 
PTE<6:5>, e.f. a group of 1, 8, 64, or 512 pages starting at a virtual 
address with page_size + 3*N low-order zeros. 


2. The block is a group of physically contiguous pages that are aligned 
both virtually and physically. Within the block, the low 3*N bits of 
the PFNs describe the identity mapping and the high 32—3*N PFN | 
bits are all equal. 


3. Within the block, all PTEs have the same values for bits <15:0>. 
Hardware may use this hint to map the entire block with a single 
TB entry, instead of 8, 64, or 512 separare TB entries. 
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Table 3-3 (Cont.): Page Table Entry (PTE) Bit Summary 
Bits Name Meaning 


4 ASM ~~ Address space match. 


When set, this PTE matches all address space numbers. For a given VA, 
ASM must he set consistently in all processes, otherwise the address 
mapping is UNPREDICTABLE. 


3 FOE Fault on execute. 


When set, a Fault on Execute pene occurs on an attempt to execute 
any location ; in the page. 


2  FOW Fault on write. 


When set, a Fault on Write excepuon occurs on an sieupt to write any 
location in the page. 


1 FOR Fault on read. 


When set, a Fault on Read exception occurs on an attemph} to read any | 
location in the page. 


0 V Valid. 


Indicates the validity of the PFN field. When V is set the PFN field is 
valid for use by hardware. When V is clear, the PFN field is reserved 
for use by software. The V bit does not affect the validity of PTE<15:1> 
bits. 


3.5.1 Changes to Page Table Entries 


The operating system changes PTEs as part of its memory management functions. 
For example, the operating system may set or clear the V bit, change the PFN field 
as pages are moved to and from external storage media, or modify the software bits. 
The processor hardware never changes PTEs. 


Software must guarantee that each PTE is always consistent within itself. 
Changing a PTE one field at a time can cause incorrect system operation, such as 
setting PTE<V> with one instruction before establishing PTE<PFN> with another. 
Execution of an interrupt service routine between the two instructions could use an 

~ address that would map using the inconsistent PTE. Software can solve this problem 
by building a complete new PTE in a register and then moving the new PTE to the 
page table by using an STQ instruction. 


Multiprocessing makes the problem more complicated. Anothes processor could be 
reading (or even changing) the same PTE that the first processor is changing. Such 
concurrent access must produce consistent results. Software must use some form 
of software synchronization to modify PTEs that are already valid. Whenever a 
processor modifies a valid PTH, it is possible that other processors in a multiprocessor 
system may have old copies of that PTE in their translation buffer. Software must 
inform other processors of changes to PTEs. Hardware must ensure that aligned 
quadword reads and writes are atomic operations. Hardware must not cache invalid 
PTEs (PTEs with the V bit equal to 0) in translation buffers. See Section 3.8 for 
more information. 
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3.6 Memory Protection 


Memory protection is the function of saluting whether a particular - of access 


_is allowed to a specific page from a particular access mode. Access to each page is 
controlled by a protection code that specifies, for each access mode, whether read or 
write references are allowed. The processor uses the following to determine whether 
an intended access is allowed: 


¢ The virtual address, which is used to either select kseg mapping or provide the 
index into the page tables. 


e The intended access type (read or write). 
e The current access mode base on Processor Mode. 


For protection checks, the intended access is read for data loads and instruction 
fetches, and write for data stores. | 


3.6.1 Processor Access Modes 


There are two processor modes, user and kernel. The access mode of a running — 


process is stored in the processor status mode bit (PS<mode>). 


3.6.2 Protection Code 


Every page in the virtual address space is protected according to its use. A program 
may be prevented from reading or writing portions of its address space. Associated 
with each page is a protection code that describes the accessibility of the page for 
each processor mode. 


For segQ and segl, the code allows a choice of read or write protection for each 
processor mode. For each mode, access can be read/write, read-only, or no- 
access. Read and write accessibility and the protection for each mode are specified 
independently. 


For kseg, the protection code is kernel read/write, user no-access. 


3.6.3 Access-Violation Faults 


An access-violation memory-management fault occurs if an illegal access is 
attempted, as determined by the current processor mode and the page's protection. 


_ 3.7 Address Translation for SegO and Seg1 


The page tables can be accessed from physical memory, or (to reduce overhead) can 
be mapped to a linear region of the virtual address space. The following sections 
describe both access methods. 


3.7.1 Physical Access for Seg0 and Seg1 PTEs 


Seg0 and seg] address translation can be performed by accessing entries in a three- 
level page table structure. The page table base register (PTBR) contains the physical 
page frame number (PFN) of the highest level (level 1) page table. Bits <levell> of 


—— ——_— — oaeen w 


the virtual address are used to index into the first level page table to obtain the 
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physical PFN of the base of the second level (level 2) page table. Bits <level2> of 
the virtual address are used to index into the second level page table to obtain the 
physical PFN of the base of the third level (level 3) page table. Bits <level3> of the 
virtual address are used to index the third level page table to obtain the physical 
PFN of the page being referenced. The PFN is concatenated with virtual address bits 
<byte_within_page> to obtain the physical address of the location being accessed. 


If part of any page table does not reside in a memory-like region, or does reside in 
nonexistent memory, the operation of the processor is UNDEFINED. 


If the first-level or second-level PTE is valid, the protection bits are ignored; the 
protection code in the third-level PTE is used to determine accessibility. If a first 
level or second level PTE is invalid, an access-violation fault occurs if the PTE<KRE> 
equals zero. An access-violation fault on a first-level or second-level PTE implies that 
all lower-level page tables mapped by that PTE do not exist. 


The algorithm to generate a physical address from a seg(Q or seg] virtual address 
follows: 


IF {SEXT(VA<vaSize-1:0>) neq VA} THEN 
{ initiate access-violation fault} 


levell PTE +— ({PTBR * page size} + {8 * VA<levell>} ) ! Read physical 
IF levell PTE<v> EQ 0 THEN 
IF levell PTE<KRE> eq 0 THEN 
{ initiate access-violation fault} 
ELSE 
{ initiate translation-not-valid fault} 


jevel2 PTE +— ({levell PTE<PFN> * page size} + {8 * VA<level2>} ) ! Read physical 
IF level2 PTE<v> EQ 0 THEN 
IF level2 PTE<KRE> eq 0 THEN 
{ initiate access-violation fault} 
ELSE 
{ initiate translation-not-valid fault} 


level3 PTE <+— ({level2 PTE<PFN> * page size} + {8 * VA<level3>} ) ! Read physical 


IF {{{level3 PTE<UWE> eq 0} AND {write access} AND {ps<mode> EQ 1} } OR 
{{level3 PTE<URE> eq 0} AND {read access} AND {ps<mode> EQ 1} } OR 
{{level3 PTE<KWE> eq 0} AND {write access} AND {ps<mode> EQ 0} } OR 
{{level3_ PTE<KRE> eq 0} AND {read access} AND {ps<mode> EQ 0} } } 

THEN 


{initiate memory-management fault} 
ELSE 
IF level3 PTE<v> EQ 0 THEN 
{initiate memory-management fault} 


IF { level3 PTE<FOW> eq 1} AND {write access} THEN 
{initiate memory-management fault} 

IF { level3 PTE<FOR> eq 1} AND {read access} THEN 
{initiate memory-management fault} 

IF { level3 PTE<FOE> eq 1} AND {execute access} THEN 
{initiate memory-management fault} 


Physical address +- {level3 PTE<PFN> * page size} OR VA<byte within _page> 


3.7.2 Virtual Access for Seg0 or Seg! PTEs © 


The page tables can be mapped into a linear region of the virtual address space, 
reducing the overhead for seg0 and segl PTE accesses. The mapping is done as 
follows: | 
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1. Select a 2*le(pageSize/8)+3 hyte-aligned region (an address with 3+ lg(pageSize/8) +3 
low-order zeros) in the seg0 or seg] address space. Set the virtual page table 
pointer (VPTPTR) with a write virtual page table pointer instruction (wrvptptr) 

to the selected value. 


2. Create a levell PTE to map the page tables as follows. 


levell PTE = 0 ! Initialize all fields to 0 
levell PTE<63:32> = pfn_of Level_ 1 pagetable 
: ! Set the PFN to the PFN of the level one pagetable 


levell PTE<8> = 1 ! Set the kernel read enable bit 
levell _PTE<0> = 1 1! Set the valid bit 

3. Set the levell page table entry that corresponds to the VPTB to the created 
levell_PTE. 


4. Set all levell and level 2 valid PTEs to allow kernel read access. With this setup 
in place the algorithm to fetch a segO or seg] PTE is: 
tmp «- left shift (va, {64 ~ {{lg(pageSize) *4} - 9}} ) 
tmp <+- right_ shift (tmp, {64 - {{lg(pageSize) *4} - 9} + lg(pageSize) - 3} ) 
tmp ¢- VPTB OR tmp 
tmp<2:0> +- 0 
level3 PTE +~ (tmp) ! Load PTE using it’s virtual address 


The virtual access method is used by PALcode for most TB fills. 


3.8 Translation Buffer 


In order to save actual memory references when repeatedly referencing the 
same pages, hardware implementations include a translation buffer to remember 
successful virtual address translations and page states. When the process context 
is changed, a new value is loaded into the address space number (ASN) internal 

processor register with a swap process context (swpctx) instruction. This causes 
address translations for pages with PTE<ASM3> clear to be invalidated on a processor 
that does not implement address space numbers. 


Additionally, when the software changes any part (except the software field) of a 
valid PTE, it must also execute a CALL_PAL thi instruction. The entire translation 
buffer can be invalidated by thia, and all ASM=0 entries can be invalidated by tbiap. 
The translation buffer must not store invalid PTEs. Therefore, the software is not 
required to invalidate translation buffer entries when making changes for PTEs that 
are already invalid. 


3.9 Address Space Numbers 


The Alpha architecture allows a processor to optionally implement address space 
numbers (process tags) to reduce the need for invalidation of cached address 
translations for process specific addresses when a context switch occurs. \ 
The supported address space number (ASN) range is 0. MAX ASN, MAX_ASN is 
provided in the HWRPB MAX_ASN field. \ 


The address space number for the current process is loaded by software in the 
address space number (ASN) with a swpctx instruction. ASNs are processor 
specific and the hardware makes no attempt to maintain coherency across multiple — 
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processors. In a multiprocessor system, software is responsible for ensuring the 
consistency of TB entries for processes that might be rescheduled on different 
processors. 


\ Systems that support ASNs should have MAX_ASN in the range 13..65535. The 
number of ASNs should be determined by the market a system is targeting. \ 


. PROGRAMMING NOTE | 
System software should not assume that the number 
of ASNs is a power of two. This allows, for example, 
hardware to use N TB tag bits to encode (2**N)—3 ASN 
values, one value for ASM=1 PTEs, and one for invalid. 


There are several possible ways of using ASNs. There 
are several complications in a multiprocessor system. 
Consider the case where a process that executed on 
processor—1 is rescheduled on processor—2. If a page 
is deleted or its protection is changed, the TB in 
processor—1 has stale data. One solution would be to 
send an interprocessor interrupt to all the processors on 
which this process could have run and cause them to 
invalidate the changed PTE. This results in significant 
overhead in a system with several processors. Another 
solution would be to have software invalidate all TB 
entries for a process on a new processor before it can 
begin execution, if the process executed on another 
processor during its previous execution. This ensures 
the deletion of possibly stale TB entries on the new 
processor. A third solution would assign a new ASN 
whenever a process is run on a processor that is not the 
same as the last processor on which it ran. 


3.10 Memory-Management Faults 


On a memory-management fault, the fault code (MMCSR) is passed in al to specify 
the type of fault encountered, as shown in Table 3-4. 


Table 3—4: Memory-Management Fault Type Codes 
Fault MMCSR value 


Translation not valid 0 
Access violation 1 
Fault on read ae 

3 


Fault on execute 
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Table 3—4 (Cont.): Memory-Management Fault Type Codes 
Fault MMCSR value 


Fault « on write 4 


e A translation-not-valid fault is taken when a read or write reference is attempted | 
through an invalid PTE in a first, second, or third-level page table. 


e An access-violation fault is taken on a reference to a seg0 or segl address ie 
the protection field of the third-level PTE that maps the data indicates that the 
intended page reference would be illegal in the specified access mode. An access- 
violation fault is also taken if the KRE bit is a zero in an invalid first or second 
level PTE. An access-violation fault is generated for any access to a kseg address 
when the mode is user (PS<mode> EQ 1). 


e A fault-on-read (FOR) fault occurs when a read is attempted with PTE<FOR> 
set. 


e A fault-on-execute (FOE) fault occurs when an instruction fetch is attempted 
with PTE<FOE> set. 


e 6A fault-on-write (FOW) fault occurs when a write is attempted with PTE<FOW> 
set. | 
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3.11 \Revision History 
Revision 1.0, May 12, 1992 


e First review distribution 
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Chapter 4 
OSF/1 Process Structure (Ill) 


4.1 Process Definition 


A process is a single thread of execution. It is the basic entity that can be scheduled 
and is executed by the processor. A process consists of an address space and both | 
software and hardware context. The hardware context of a process is defined by the 
the following: | | 


e 30 integer registers (excluding R31 and SP) 
e 31 floating-point registers (excluding F31) 
e The program counter (PC) 
_©@ The two per-process stack pointers (USP/KSP) 
e The processor status (PS) 
e The address space number (ASN) 
e The process cycle counter (PCC) 
e The page table base register (PTBR) 
e The process unique value (unique) 
This information must be loaded if a process is to execute. 


While a process is executing, some of its hardware context is being updated in the ~ 
internal registers. When a process is not being executed, its hardware context is 
stored in memory in a software structure termed the process control block (PCB). 
Saving the process context in the PCB and loading new values from another PCB for 
a new context is termed context switching. Context switching occurs as one process 
after another is scheduled for execution. 


4.2 Process Control Block (PCB) 
As shown in Figure 4~1, the PCB holds the state of a process. 


The contents of the PCB are loaded and saved by the swpctx instruction. The PCB 
must be quadword aligned and should be 64 byte aligned for best performance. 
Kernel mode code can read the PTBR, the ASN, and the FEN for the current process. 
from the PCB. Kernel mode code must use the rdusp/wrusp instructions to access 
the USP. The PCC must be read with the rpcc instruction. The unique value can be 
accessed with the rdunique and wrunique instructions. 
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| Figure 4—1: Process Control Block (PCB) 


63 9281 ee. o | | 10 









Kernel Stack Pointer (KSP) 00 
:08 
Page Table Base Register (PTBR) | 716 







| _ Address Space Number (ASN) Cycle Counter (PCC) | 124 
F 

E}: 
N 


| Process Unique Value (unique) | 32 
40 

Reserved to Digital | 48 

Reserved to Digital | 3 56 
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4.3 \Revision History 
Revision 1.0, May 12, 1992 


e First review distribution 
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Chapter 5 
OSF/1 Exceptions and Interrupts (III) 


5.1 Introduction 


At certain times during the operation of a system, events within the system require 
the execution of software outside the explicit flow of contrel. When such an event 
occurs, an Alpha processor forces a change in control flow from that indicated by the 
current instruction stream. The notification process for such an event is either an 
exception or an interrupt. | 


5.1.1 Exceptions 


Exceptions are relevant primarily to the currently executing process. Exception 
service routines execute in response to exception conditions caused by software. All 
exception service routines execute in kernel mode on the kernel stack. poeepuon 
conditions consist of faults, arithmetic traps, and synchronous traps: 


A fault occurs during an instruction and leaves the registers and memory in 
a consistent state such that elimination of the fault condition and subsequent 
reexecution of the instruction gives correct results. Faults are not guaranteed to 
leave the machine in exactly the same state it was in immediately prior to the 
fault, but rather in a state such that the instruction can be correctly executed if 
the fault condition is removed. The PC saved in the exception stack frame is the 
address of the faulting instruction. An rti instruction to that PC reexecutes the 
faulting instruction. 


An arithmetic trap occurs at the completion of the operation that caused the — 
exception. Since several instructions may be in various stages of execution at any 
point in time, it is possible for multiple arithmetic traps to occur simultaneously. 


The PC that is saved in the exception frame on traps is that of the next 
instruction that would have been issued if the trapping conditions had not 
occurred. However, that PC is not necessarily the address of the instruction 
immediately following the instructions that encountered the trap condition. 
Further, intervening instructions may have changed operands or other state used 
by the instructions encountering the trap conditions. 


An rti instruction to that PC does not reexecute the trapping instructions, nor 
does it reexecute any intervening instructions; it simply continues execution from 


the point at which the trap was taken. 


In general, it is difficult to fix up results and continue program execution at the 
point of an arithmetic trap. Software can force a trap to be continued more easily 
without the need for complicated fixup code. This is accomplished by following a 
set of code generation restrictions in the code that could cause arithmetic traps 
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which are to be completed by a software trap handler (see Common Architecture, 


Chapter 4), including ee the /S software an momar 3 ineach such = 


instruction. 


The AND of all the software eonpiction modifiers for teenies instructions is 
provided to the arithmetic trap handler in the exception summary SWC bit. If 
the SWC is set, a trap handler may find the trigger instruction by scanning 
backward from the trap PC until each register in the register write mask has 
been an instruction destination. The trigger instruction is the first instruction in | 
the I-stream order to get a trap within a trap shadow. (See Common Architecture, 
Chapter 4 for a definition of trap shadow.) If the SWC bit is clear, no fixup is 
possible. 


°e A syn ieeasas trap occurs at the completion of the operation that caused the 
exception. No instructions can be issued between the completion of the operation 
that caused the exception and the trap. | 


9.1 .2 Interrupts 


The processor arbitrates interrupt equa When the interrupt priority level (IPL) 
of an outstanding interrupt is greater than the current IPL, the processor raises IPL 
to the level of the interrupt and dispatches to entInt, the interrupt entry to the OS. 
Interrupts are serviced in kernel mode on the kernel stack. Interrupts can come — 
from one of four sources: I/O devices, the clock, performance counters, or machine © 
checks. 


5.2 Processor Status 


The processor status (PS) is a four-bit register that contains the current mode 
(PS<mode>) in bit <3> and a three-bit interrupt priority level (PS<IPL>) in bits 
<2..0>. The PS<mode> bit is zero for kernel mode and one for user mode. The 
PS<IPL> bits are always zero if the mode is user and can be 0 to 7 if the mode is 
kernel. The PS is changed when an interrupt or exception is initiated and by the 
rti, retsys, and swpip! instructions. 


The uses of the PS values are shown in Table 5—1. 


Table 5-1: Processor Status Summary 


PS<mode> PS<IPL> Mode Use 

1 0 User User software 

0 0 Kernel System software 

0 1 Kernel System software 

0 2 Kernel System software 

0 3 Kernel Low priority device interrupts | 
0 4 Kernel High priority device interrupts 
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Table 5—1 (Cont.): Processor Status Summary 


PS<mode> PS<IPL> Mode Use 

0 a) _ Kernel Clock, and interprocessor interrupts 
0 6 Kernel Real time devices 

0 7 Kernel Machine checks 


5.3 Stack Frames 


There are two types of system entries—those for the callsys instruction and those for 
exceptions and interrupts. Both types use the same stack frame layout, as shown in 
Figure 5-1. The stack frame contains space for the PC, the PS, the saved GP, and 

the saved registers a0, al, a2. On entry, the SP points to the saved PS. | 


The callsys entry saves the PC, the PS, and the GP. The exception and interrupt 
entries save the PC, the PS, the GP, and also save the registers a0..a2. 


Figure 5-1: Stack Frame Layout 
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5.4 System Entry Addresses 


All system entries are in kernel mode. The interrupt priority PS bits (PS<IPL>) are 
set as shown in the following table. The system entry point address is set by the 
CALL_PAL wrent instruction, as described in Section 2.2.10. 

Table 5-2: Entry Point Address Registers | | 

Entry Point Value in a0 Value in al value in a2 PS<IPL> 


entArith Exception Register mask | UNPREDICT- Unchanged 
summary ABLE 

entIF Fault Type code UNPREDICT- UNPREDICT- Unchanged 
ABLE ABLE 


OSF/1 Exceptions and Interrupts (Ill) 5-3 





Table 5—2 (Cont.): Entry Point Address Registers | 
Entry Point Valueina0Q  Valueinal valueina2 $$ PS<IPL> 


entInt | Interrupt type Vector UNPREDICT- | Priority of interrupt 
entMM VA MMCSR-_ Cause Unchanged 
entSys | po | : pl | p2 | Unchanged 
entUna VA Opcode - Src/Dst _ Unchanged | 


5.4.1 System Entry Arithmetic Trap (entArith) 


» §.4.1.1 


The arithmetic trap entry, entArith, is called when an arithmetic trap occurs. On 
entry, a0 contains the exception summary register and al contains the exception 
register write mask. Section 5.4.1.1 describes the exception summary ean and 
Section 5.4.1.2 describes the register write mask. 


Exception Summary Register 


The exception summary register, shown in Figure 5—2 and described in Table 5-3, 
records the various types of arithmetic exceptions that can occur together. Those 
types of exceptions are listed and described in Table 5-3. 


Figure 5-2: Exception Summary Register 


63 | | 765 210 


Table 5-3: Exception Summary Register Bit Definitions 
Bit Description 


<O-—|o 
TZC [A 
N<O jo 


0 Software completion (SWC) 


Is set when all of the other arithmetic exception bits were set by floating-operate 
instructions with the /S software completion trap modifier set. See Common 
Architecture, Chapter 4 for rules about setting the /S modifier in code that may cause 
an arithmetic trap, and Section 5.1.1 for rules about using the SWC bit in a trap 
handler. 
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Table 5-3 (Cont.): | ats lel Summary Register Bit Definitions — | 


Bit 


1 


Description 


Invalid operation (INV) 
An attempt was made to perform a floating arithmetic, conversion, or comparison | 


operation, and one or more of the operand values were illegal. 


An INV trap is reported for most floating-point operate instructions with an input 
operand that is an IEEE NaN, IEEE infinity, or IEEE denormal. 


Floating invalid operation traps are always enabled. If this trap occurs, the result. 
register is written with an UNPREDICTABLE value. 


Division by zero (DZE) 
An attempt was made to perform a floating divide operation with a divisor of zero. 


A DZE trap is reported when a finite number is divided by zero. Floating divide by 
zero traps are always enabled. If this trap occurs, the result register is written with 
an UNPREDICTABLE value. 


Overfiow (OVF) 


A floating arithmetic or conversion operation overflowed the destination exponent. 


An OVF trap is reported when the destination’s largest finite number is exceeded in 
magnitude by the rounded true result. Floating overflow traps are always enabled. If 
this trap occurs, the result register is written with an UNPREDICTABLE value. 


Underfiow (UNF) | 
A floating arithmetic or conversion operation underflowed the destination exponent. 


An UNF trap is reported when the destination’s smallest finite number exceeds in 
magnitude the non-zero rounded true result. Floating underflow trap enable can be 
specified in each floating-point operate instruction. If underflow occurs, the result 
register is written with a true zero. 


Inexact result (INE) 


A floating arithmetic or conversion eee gave a result that differed from the 
mathematically exact result. 


An INE trap is reported if the rounded result of an IEEE operation is not exact. Inexact 
result trap enable can be specified in each IEEE floating-point operate instruction. The 
rounded result value is stored in all cases. 


Integer overflow (IOV) 


An integer arithmetic operation or a conversion from floating to integer overflowed the 
destination precision. 


An IOV trap is reported for any integer operation whose true result exceeds the 
destination register size. Integer overflow trap enable can be specified in each 
arithmetic integer operate instruction and each floating-point convert-to-integer 
instruction. If integer overflow occurs, the result register 1 is written with the truncated 
true result. 
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§.4.1.2 Exception Register Write Mask 


The exception register write mask parameter records all registers chat were targets 
of instructions that set the bits in the exception summary register. There is a one- 
to-one correspondence between bits in the register write mask quadword and the 
register numbers. The quadword records, starting at bit 0 and proceeding right 
to left, which of the registers r0 throngh r31, then f0 through 31, received an 
exceptional result. 


NOTE 
_ For a sequence such as: 


ADDF F1,F2,F3 
MULF F4,F5,F3 


if the add overflows and the multiply does not, the OVF 
bit is set in the exception summary, and the F93 bit is 
set in the register mask, even though the overflowed 
sum in F3 can be overwritten with an in-range product 
by the time the trap is taken. (This code violates the 
destination reuse rule for software completion. See 
Common Architecture, ae oll 4 for the destination 
reuse rules.) 


The PC value saved in the exception stack frame is the virtual address of the next 
instruction. This is defined as the virtual address of the first instruction not executed 
after the trap condition was recognized. 


5.4.2 System Entry Instruction Fault (entiF) 


The instruction fault entry is called for bpt, bugchk, gentrap, opDec, and for a FEN 
fault (floating-point instruction when the floating-point unit is disabled, FEN EQ 0). 
On entry, a0 contains a 0 for a bpt, a 1 for bugchk, a 2 for gentrap, a 3 for FEN fault, 
and a 4 for opDec. No additional data is passed in al..a2. The saved PC at (SP+00) 
is the address of the instruction that caused the fault for FEN faults. The saved 
PC at (SP+00) is the address of the instruction after the instruction that caused the 
fault bpt, bugchk, gentrap, and opDec faults. 


5.4.3 System Entry Hardware Interrupts (entint) 


The interrupt entry is called to service a hardware interrupt, or a machine check. 
Table 5—4 shows what is passed in a0..a2 and the PS<IPL> setting for various 
interrupts. 


Table 5-4: System Entry Hardware Interrupts 


Entry Type Value in a0 Value in al value in a2 PS<IPL> 
Interprocessor 0 UNPREDICT- UNPREDICT- 5 
interrupt ABLE ABLE 
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Table 5—4 (Cont.): System Entry Hardware Interrupts — 


Entry Type | Value in a0 Valueinal value in a2 PS<IPL> 
Clock 1 | UNPREDICT- UNPREDICT- 5 

| ABLE ABLE | 
Machine check 2 - Interrupt Pointer to 7 

7 vector Logout Area 

I/O device 3 Interrupt UNPREDICT- Level of device 
interrupt | vector ABLE 
Performance 4 Interrupt UNPREDICT.- 6 
counter | vector ABLE 


On entry to the hardware interrupt routine, the IPL has been set to the level of the 
interrupt. For hardware interrupts, register al contains a platform-specific interrupt 
vector. That platform-specific interrupt vector is typically the same value as the SCB | 
offset value that would be returned if the platform was running OpenVMS PALcode. 


For a machine check, a2 contains kseg address of the logout area. The first 4 

_ longwords of the logout area are implementation-independent. The rest of the logout 
area is system specific. The first longword of the logout area is a machine check in 
progress flag. If the flag is non zero when a machine check is being initiated, a 
double machine check halt is initiated instead. The machine check handler needs to 
clear the machine check in progress flag when it can handle a new machine check. 
Figure 5-3 describes the logout area. 


Figure 5-3: Logout Area 
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An Iimplementation-Dependent Number of 16 
| Quadwords of Additional State 


5.4.4 System Entry MM Fault (entMM) 


The memory-management fault entry is called when a memory management 
exception occurs. On entry, a0 contains the faulting virtual address and al contains 
the MMCSR (See Section 3.10). On entry, a2 is set to a minus one (—1) for an 
instruction fetch fault, to a plus one (+1) for a fault caused by a store instruction, 
or to a 0 for a fault caused by a load instruction. 





3 0 
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5.4.5 System Entry Call System (entSys) 


The system call entry is called when a callsys instruction is svseuteds in user mode. 


On entry, only registers (t8..t11) have been modified. The PC+4 of the callsys 


instruction, the user global pointer, and the current PS are saved on the kernel | 


stack. Additional space for a0..a2 is allocated. After completion of the — service 
routine, the kernel code executes a CALL_PAL retsys instruction. 


5.4.6 System Entry Unaligned Access (entUna) 


The unaligned access entry is called when a load or store access is not aligned. On © 
entry, a0 contains the faulting virtual address, al contains the zero extended six-bit | 


opcode (bits <31:26>) of the faulting instruction, and a2 contains the zero extended 
data source or destination register number (bits<25:21> of the faulting instruction) 


5.5 PALcode Support 


5.5.1 Stack Writeability and Alignment 


PALcode only accesses the kernel stack. Any PALcode accesses to the kernel stack 
that would produce a memory-management fault will result in a kernel-stack-not- 
valid halt. The stack pointer must always point to a quadword-aligned address. If 


the kernel stack is not quadword aligned on a PALcode access, a kernel-stack-not- 


valid halt is initiated. 
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5.6 \Revision History — 
Revision 1.0, May 12, 1992 | 


e First review distribution 
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Platforms (IV) 


This part describes an architected platform implementation and contains the 
following chapters: 


Chapter 1, Console Subsystem Overview and Operator Interface (IV) 
e Chapter 2, Console Interface to Operating System Software (IV) — 
¢ Chapter 3, System Bootstrapping (IV) | 
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| Chapter 1 
Console Subsystem Overview and Operator Interface 
(IV) 


On an Alpha system, underlying control of the system platform hardware is provided 
by a “console” ! . The console: | 


1. Initializes, tests, and prepares the system platform hardware for Alpha system 
software. 


2. Bootstraps (loads into memory and starts the execution of) system software. 


8. Controls and monitors the state and state transitions of each processor in a 
multiprocessor system. 


4. Provides services to system software which simplify system software control of 
and access to platform hardware. 


5. Provides a means for a “console operator” to monitor and control the system. 


The console interacts with system platform hardware to accomplish the first three. 
The actual mechanisms of these interactions are obviously specific to the platform 
hardware, however the net effects are common to all systems. Chapter 3 describes 
these functions. 


The console interacts with system software once control of the system platform hard- 
ware has been transferred to that software. Chapter 2 discusses the basic functions 
of a console and its interaction with Alpha system software. 


The console interacts with the console operator through a virtual display device 
or “console terminal”. The console operator may be a human or a management 
application. The console terminal forms the interface between the console and a 
console presentation layer. The functions of that presentation layer and the display 
formats are described in Section 1.3. 


In an Alpha multiprocessor system, there is one primary processor and one or more 
secondary processors. The primary is the processor that: | 


1. Can legally refer to the console I/O devices, 

2. Can legally send characters to the console terminal, 

3. Can legally receive characters from the console terminal, 
4 


Has direct access to a BB_WATCH on the system 


1 A term shrouded in the antiquity of computing. So named because this mechanism was first realized as a desklike panel 
of switches and blinking lights. 
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5. Is named in response to an inquiry as to which processor is primary. 


All other processors in the system are secondary processors. 


1.1 Console Implementations 


The actual implementation of an Alpha console varies from system to system. Re- 
gardless of implementation, the console on each system provides the functionality 


described in this chapter and in Chapters 2 and 3. The console may be implemented 


as: 


e “Embedded” or co-resident in the hardware platform complex which contains the - 
_. processors. | 


° “Detached” or resident on a separate and distinct hardware platform. 
e Any hybrid of the above. 


The distinction is somewhat arbitrary. A detached console may have cooperating 
special code which executes on one of the processors; an embedded console may have 


a cooperating management application which executes on a remote machine. 


Regardless of the actual implementation, each console must provide: | 


1. A virtual display device, the default “console terminal”. 


This device allows the console operator to issue commands and receive displays. 
In the absence of hardware errors and with the proper console-lock setting, the 
default console terminal device provides reliable communication with the rest of 
the console. 


2. Reliable access to console functionality by system software and the console oper- 
ator. 


All console functionality must appear to be resident within the console at all 
times. All console functions must be accessible in a timely ; manner, without prior 
notification, and with sufficient reliability. 


3. Secure communications with system software and the console operator. 


All console communication paths must be able to be made secure by either phys- | 
ical measures or encryption methods. 


4. A mechanism by which the console can gain control of a processor executing 
system software. 


This mechanism must preserve the execution state of system software; it must 
be possible for the console to gain control of the processor, and subsequently 
continue system software execution successfully. 


5. A mechanical mechanism which locks the console. 


The console lock may be a keyswitch, jumper, or any other implementation- 
specific mechanism; see Section 1.2. The lock is either “locked” or “unlocked”. 





1.1.1 Console Implementation Registry 


This chapter, and Chapters 2 and 3 specify required console functions. Some of fess 
functions have attributes which may vary with console implementation; consoles 
may also extend beyond the required functions. Console functions or attributes 
which may vary with implementation are: 


Supported CTBs | 

Supported environment variables 

Environment variable value formats, such as BOOT_DEV or BOOT_OSFLAGS 
Configuration Data Block format 

Supported callback routines 

Supported bootstrap media 


oe oe ee 


Implementation-specific HALT codes or messages | 
Functions implemented by current consoles are summarized in Appendix E. \Also 
see that appendix for information on how to register a function. \ 


1.2 Console Lock Mechanisms 
TBD in a subsequent ECO. 


1.3 Console Presentation Layer 


The console presentation layer is TBD in a subsequent ECO. This text assumes the 
following command syntax: 


¢ BOOT (bootstrap the system) 

e CONTINUE (continue execution) 

e START -CPU (start a given secondary) | 

e INITIALIZE (initialize system) 

e INITIALIZE -CPU (initialize a given processor) 

¢ HALT -CPU (force a given processor into console I/O mode) 
e HALT -CRASH (cause a given processor to initiate a crash) 


1.4 Messages 


The console generates a binary message code to the console presentation layer to 
signal messages, such as audit trail or error messages. The console presentation 
layer interprets the binary code into something meaningful to the console operator. 
Table 1-1 summarizes the binary message codes, symbol names, and the expected 
translation into English. 
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Table 1-1: 


GaQawpwno man nah © NY FH 


Console Error Messages 


Code;, Symbol English Interpretation 
_. AUDIT BOOT STARTS Audit trail of booting begins 
) AUDIT_BSTRAP_GOOD Bootstrap checksum matches | 
AUDIT_BSTRAP_ACCESSIBLE Bootstrap image accessible 
AUDIT_CHECKSUM_GOOD Boot block checksum matches 
AUDIT_LOAD_BEGINS Loading of bootstrap begins 
~ AUDIT_LOAD_DONE Loading of bootstrap done 
AUDIT_TAPE_ANSI Verified as ANSI tape - 
| AUDIT FILE_FOUND<filename> Found <filename> | 
AUDIT_TAPE_BBLOCK Verified as bootblocked tape 
AUDIT_BOOT_TYPE<string> | Bootstrap type <string> 
AUDIT_BOOT_REQ<filename> Requesting bootstrap<filename> 
AUDIT_BSERVER_FOUND Remote server located 
AUDIT_BSTRAP_ABORT - Bootstrap load abort 
E-3F reserved 
40 ?PALREQ? _ PALcode load request 
41 ?STARTREQ? Secondary start request 
42-7F reserved 
80 ERROR_BOOT_ABORT Unable Boot 
81 ERROR_PROCINIT | Unable to Initialize Processor 
82-FFF reserved 
other console implementation-specific 


1.5 Implementation Considerations 


1.5.1 Console Implementations 


1-4 


\ This chapter and Chapters 2 and 3 attempt to standardize across all console im- 
plementations, the dissimilar options, functions, and features, that were not men- 
tioned by DEC STD 032. The lack of standardization for VAX systems presented 
VAX software and Digital Field Service with too many different interfaces. \ 


The goal of the Alpha console architecture is to promote a consistent interface across 
all Alpha systems. Some console functionality is inherently implementation-specific 
and cannot be required of all Alpha systems; some may be applicable to more than 





one Alpha system. To prevent the proliferation of interfaces and achieve commonal]- 
ity of function whenever possible, the Alpha console architecture requires that: 


1. Any console function which is visible to system software which is not specified 
by these chapters must be registered with the Alpha architecture group. 


2. Any console function which is visible to an on-site or remote console operator 
(including Field Service engineers) which is not specified by these ae sa must 
be registered with the Alpha architecture group. | 


3. Whenever possible, implementations must use previously registered functions 
rather than inventing new variations. 


Console functions intended for use solely by development engineering or expert- 
level repair and diagnosis are excluded from the above. See Appendix E for registry 
information. 


1.5.2 Security 


The means by which the console achieves a secure communications path with sys- 
tem software and with the console operator is implementation-specific. Embedded 
consoles inherently have the capability of secure communications with system soft- 
ware. Detached consoles can achieve this security by residing in the same room as 
the Alpha system and communicating with it over a private connection. Detached 
consoles can also achieve security by using an encrypted protocol over a shared con- 
nection. This latter method allows a workstation over a network to function as the 
console. 


1.5.3 Internationalization 


Wherever possible, console implementations should support the poale of internation- 
alization: | 


1. Each message has a binary message code. The console presentation layer inter- 
prets the code into a meaningful message display of the appropriate language 
and characters. 


2. Consoles should avoid explicitly interpreting character set encoding (such as 
ISO—LATIN-—1). Character strings are to be viewed as simple byte strings. Thus, 
the GETC console callback routine supports from one to four byte character en- 
codings depending on the currently selected language and character set; the 
PUTS routine outputs only a byte stream. 


3. ASCII strings are used in certain fields of the HWRPB and certain interprocessor 
communications due to DEC Standard 12 and to present a common interface to 
system software. 


4, The currently selected character-set encoding and language to be used for the 
console terminal are defined ed the CHAR_ SET and LANGUAGE environment 
variables. 


5. The end of a character string passed between the console and the operating 
system as an argument to a console callback routine is determined by passing 
its length. 
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6. 


Console callback routines should be written to be independent from character- 
set encoding and language. At a minimum, every implementation must support 
ISO-—LATIN-1 character-set encodings. The supported character-set encodings | 

is determined by platform product requirements. 7 _ 


The console presentation layer is independent of the required console suEGHOD: | 
ality interface. © 


1.5.4 ISO-LATIN-1 Support 


Implementations supporting the ISO -LATIN -1 character-set encoding must have the 
following properties: 


1-6 


1. 
2. 


The GETC console callback routine returns a one byte character: see Section 2.3.4. 


The PROCESS_KEYCODE console callback routine returns a one byte character; 


see Section 2.3.4 


English console presentation iayets are strongly snenieaged to use the actual 
values as defined in Table 2-5, rather than inventing aliases. 





1.6 \REVISION HISTORY 
Revision 5.0, May 12, 1992 
1. Reorganized according to SRM Rev 5 requirements 
2. Converted to SDML _ : 
3. Replace previous Console Chapter with Console ECO #15 
4. Includes 3 chapters and two appendices, renumber I/O Chapter 
5. Material substantially changed or rearranged 
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Chapter 2 
Console Interface to Operating System Software (IV) 


This chapter describes the interactions between the console subsystem and system 
_ software. These services depend on state which is shared between the console and 
system software. That shared state is contained in the “Hardware Restart Param- 
eter Block” (HWRPB) and a number of “environment variables”. The HWRPB is a 
data structure which is directly accessed by both the console and system software; 
the environment variables are indirectly accessed by system software. Section 2.1 
describes the HWRPB; Section 2.2 describes the environment variables. The ser- 
vice, or “callback”, routines provided by the console to system software are given in 
Section 2.8. Communication between the console and system software is described 
in Section 2.4. Functions implemented by registered consoles are summarized in 
Appendix E. Various implementation considerations are given in Section 2.5. 


2.1 Hardware Restart Parameter Block (HWRPB) 


The Hardware Restart Parameter Block (HWRPB) is a page-aligned data structure 
shared between the console and system software. The HWRPB is a critical resource 
during bootstraps, powerfail recoveries, and other restart situations. The fields of 
the HWRPB are shown in Figure 2—1 and described in Table 2—1. 


The console creates the HWRPB and the required per-CPU, CTB, CRB, and MEMDSC 
offset blocks as a physically contiguous structure during console initialization. Fields 
within the HWRPB and the required offset blocks are updated by the console and 
system software during and after system bootstrapping. The console must be able 
to locate the HWRPB and the required offset blocks at all times. Neither the console 
nor system software may move the HWRPB or the required offset blocks to different 
physical memory locations; panecanent operation of the system is UNDEFINED if 
such an attempt is made. 


The HWRPB and the required offset blocks must comprise a virtually contiguous 
structure at all times. Prior to transferring control to system software, the console 
maps the HWRPB and the required offset blocks into contiguous addresses beginning 
at virtual address 0000 0000 1000 0000,,. in the initial bootstrap address space. 
If system software subsequently changes this virtual mapping, any new mapping 

-must preserve the relative offsets of all fields and blocks; all physically contiguous 
pages must remain virtually contiguous. Note that some of the data structures 
located by HWRPB fields need not be contiguous with the HWRPB. Those structures 
which may be discontiguous are the optional CONFIG Block, the optional FRU Table, 
the PALcode space(s), the logout area(s), the CRB pages, and the memory bitmaps 
located by the MEMDSC Table. 
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Figure 2-1: HWRPB Overview 


HWRPB 


General Information 


Per-CPU Offset 
CTB Table Offset 
CRB Offset 
MEMDSC Offset 
CONFIG Offset 
FRU Table Offset 
(Restart Routine Linkage Pair) 


CONFIG Table 
FRU Table 


CPU Restart Routine 


~Per-CPU Slots 


PALcode Spaces 
PALcode Pointers 


Logout Area Pointers | 


Console Terminal Block 
(CTB) Table 


Console Routine Block 
(CRB) 


CRB Map Entries 


CPU Logout Areas 





CRB Pages 


Memory Data 
Descriptor Table _ 


_ Cluster # 1 Bitmap 
: Register # 1 Bitmap Pointer 


Cluster # n Bitmap 


Register # n Bitmap Pointer 





All offset blocks must be at least quadword aligned. The starting address of an offset 
block is determined by adding the contents of the HWRPB offset field to the starting 
address of the HWRPB. For example, the starting address of the MEMDSC block is 
given by: | | | 


= HWRPB address + MEMDSC OFFSET 
= HWRPB address + (HWRPB[200]}) 


The total size of the HWRPB and the required offset blocks is on the order of 8KB to 
16KB. The size is contained in the HWRPB_SIZE field at HWRPB[24]. The required 
offset blocks may be offset from the HWRPB in any order; the HWRPB offset fields 
must not be used to infer the size of the HWRPB nor any offset block. 
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Physical Address of the HWRPB ‘HWRPB 
"“HWRPB" +08 
HWRPB Revision +16 
HWRPB Size +24 
Primary CPU ID +32 
Page Size (Bytes) +40 
Number of PA Bits +48 
Maximum Valid ASN :+56 
System Serial Number (SSN) +64 
System Type :+80 
System Variation :+88 
System Revision +96 
Interval Clock Interrupt Frequency :+104 
Cycle Counter Frequency +112 
Virtual Page Table Base :+120 
Reserved for Architecture Use +128 
Offset to Translation Buffer Hint Block +136 
Number of Processor Slots +144 
Per-CPU Slot Size :+152 
Offset to Per-CPU Slots :+160 
Number of CTBs +168 
CTB Size — +176 
Offset to Console Terminal Block Table +184 
Offset to Console Callback Routine Block +192 
_ Offset to Memory Data Descriptor Table :+200 
Offset to Configuration Data Block (If Present) +208 
Offset to FRU Table (If Present) +216 
Virtual Address of Terminal Save State Routine +224 
Procedure Value of Terminal Save State Routine +232 
Virtual Address of Terminal Restore State Routine :+240 


Procedure Value of Terminal Restore State Routine +248 
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. Procedure Value of CPU Restart Routine 


Figure 2-2 (Cont.): Hardware Restart Parameter Block Structure . 







63 | | | = o | | 
| Virtual Address of CPU Restart Routine - 14256 


+264 






+272 


i Translation Buffer Hint Block | 


- -+(HWRPBI160)) 


i | Per-Processor Slots | T : 
| ane wie 


Console Callback Routine Block i :+(HWRPB[192]}) 


-+(HWRPB[200}) 


Memory Data Descriptor Table | : 


Table 2~1: HWRPB Fields 
Offset | Description 






+280 






+288 






+296 
+304 





-+(HWRPB[136}) 


HWRPB HWRPB PA! 


Starting physical address of the HWRPB field. This field is used by the 
console to validate the HWRPB. 


+08 HWRPB VALIDATION! 


Quadword containing “HWRPB<0><0><0>” (0000 0042 5052 57481¢). This 
field is used by the console to validate the HWRPB. 


1{nitialized by the console at cold system bootstrap only. Preserved unchanged by the console at all warm system 
bootstraps. 





Table 2-1 (Cont.): HWRPB Fields 


Offset 
+16 


+24 » 


+32 


+40 


+48 


+56 


+64 


+80 


Description 
HWRPB REVISION! 


Format of the HWRPB. See Section 2.1.1. Assigned values are referenced 
to the revision level of this chapter: 


Version Interpretation 


0 Reserved 

1 Revision 1.1—2.1 \ADU only\ 
2 Revision 3.0 

3 Revision 3.3 \ECO #30\ 
other Reserved for future use 
HWRPB SIZE} 


Size in bytes of the HWRPB and required physically contiguous per-CPU, 
CTB, CRB, and MEMDSC offset blocks. Unsigned field. 


PRIMARY CPU ID!4 


WHAMI of the primary processor. System software modifies this field only 
at primary switch; see Section 3.4.6. Unsigned field. 


PAGE SIZE! 


Number of bytes within a page for this Alpha processor implementation. 
Unsigned field. 


PA SIZE! 


Size of the physical address space in bits for this Alpha processor imple- 
mentation. PA SIZE must be 48 bits or less; see Open VMS Section, Chapter 
3 and Common Architecture, Chapter 5. Unsigned field. 


MAX VALID ASN} 


Maximum ASN value allowed by this Alpha processor amplmentauen: 
Unsigned field. 


SYSTEM SERIAL NUMBER} 


Full DEC STD 12 serial number for this Alpha System. This octaword 
field contains a 10 character ASCII serial number determined at the time. 
of manufacture; see DEC STD 12 for format information. 


SYSTEM TYPE! 


Family or system hardware platform. Assigned values are summarized in © 
ve D; see Section 2.1.1. Unsigned field. 


T[nitialized by the console at cold system bootstrap only. Preserved enehaneed by the console at all warm system 


bootstraps. 


4May be modifed by system software. 
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Table 2-1 (Cont.): HWRPB Fields 


Offset 
+88 


+96 


+104 


+112 


+120 


+128 


+136 
+144 
+152 


+160 


Description 
SYSTEM VARIATION? 4 


Subtype variation of the system. This may include whether the system has 
optional features such as multiprocessor support or special power supply 


conditioning. Assigned values are summarized in EDEN D; see Sec- 


tion 2.1.1. 


SYSTEM REVISION CODE! 4 


DEC STD 12 revision field for this Alpha system. Four ASCII characters. 
May be modified by system software or application software. | 


INTERVAL CLOCK INTERRUPT FREQUENCY! 


Number of interval clock interrupts per second (scaled by 4096) in this 
Alpha system. Interrupts occur only if enabled; see OpenVMS Section, 
Chapter 6. Unsigned field. 


CYCLE COUNTER FREQUENCY! 


Number of SCC and PCC updates per second in this Alpha system. See 
the RPCC and PAL RSCC instructions. Unsigned field. 


VIRTUAL PAGE TABLE BASE?:4 


Virtual address of the base of the entire three-level page table structure; 
see OpenVMS Section, Chapter 6. The console sets this field to the virtual 
address of the LIPTE within bootstrap address space at system bootstraps 
and restores VPTB IPR with this value at all processor restarts. System 
software is responsible for updating this field whenever the VPTB IPR is 
modified. See Sections 3.3.1.3, 3.3.3.5, and 3.4.2. 


Reserved 
Reserved for architecture use; SBZ. 
TB HINT OFFSET! 


Unsigned offset to the starting address of the Translation Buffer Hit Block 
(TBB). See Section 2.1.2. | 


NUMBER OF PER-CPU SLOTS! 


Number of per-CPU slots present. See Section 2.1.3 for the per-CPU slot 
format. Unsigned field. 


PER-CPU SLOT SIZE! 


Size in bytes of each per-CPU slot rounded up to the next integer multiple 
of 128. See Section 2.1.3. Unsigned field. 


CPU SLOT OFFSET! 
Unsigned offset to the first per-CPU slot in the HWRPB. See Section 2.1.3. 


l[nitialized by the console at cold system bootstrap only. Preserved unchanged by the console at all warm system 
bootstraps. 


2Initialized by the console at all system bootstraps (cold or oy 
“May be modifed by system software. 





- Table 2-1 (Cont.): HWRPB Fields 


Offset 


+168 
+176 
+184 
+192 
+200 
+208 
+216 


+224 


+232 


+240 


Description 
NUMBER OF CTB? 


Number of Console Terminal Blocks (CTBs) contained in the CTB Table. 
See Section 2.3.8.2. ‘Unsigned field. 


CTB SIZE! 


Size in bytes of the largest Console Terminal Block (CTB) contained in the 
CTB Table. See Section 2.3.8.2. Unsigned field. 


CTB OFFSET! 


Unsigned offset to the starting address of the Console Terminal Block 
(CTB) Table. See Section 2.3.8.2. 


CRB OFFSET! 


Unsigned offset to the starting address of the Console Callback Routine 
Block (CRB). See Section 2.3.8.1. 


MEMDSC OFFSET! 


Unsigned offset to the starting address of the Memory Data Descriptor 
(MEMDSC) Table. See Section 3.3.1.1. 


CONFIG OFFSET! 


Unsigned offset to the starting address of the Configuration Data Table 


(CONFIG). If zero, no CONFIG Table exists. See Section 2.1.4. 
FRU TABLE OFFSET! 


Unsigned offset to the starting address of the Field Replaceable Unit (FRU) 
Table. If zero, no FRU Table exists. See Section 2.1.5. 


SAVE_TERM RTN VA24 


Starting virtual address of a routine which saves console terminal state. 
This routine is optionally provided by system software. See Section 3.4.7. 
Set to zero by the console at system bootstraps. | 


SAVE_TERM VALUE?# 


Procedure value of the SAVE_TERM routine optionally provided by sys- 
tem software. The console copies this value into R27 before invoking the 
routine; see Section 3.4.7. Set to zero by the console at system bootstraps. 


RESTORE_TERM RTN VA?"4 


Starting virtual address of a routine which restores console terminal state. _ 


This routine is optionally provided by system software. See Section 3.4.7. 
Set to zero by the console at system bootstraps. 


l{nitialized by the console at cold system bootstrap only. Preserved unchanged by the console at all warm system 


bootstraps. 


2Initialized by the console at all system bootstraps (cold or warm). 
“May be modifed by system software. 
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Table 2-1 (Cont.): HWRPB Fields 
Offset . 7 Description | 
+248 | RESTORE_TERM VALUE? 


Procedure value of the RESTORE_TERM eidtins optionally aad by 
system software. The console copies this value into R27 before invoking the 
routine; see Section 3.4.7. Set to Zero by the console at system pareeere. 


+256 | RESTART RTN VA24 - 


Starting virtual address of a CPU pia routine provided by ated soft- 
ware. The console restarts system software by transferring control to this 
routine. See Section 3.4. Set to zero by he console at system bootstraps. 


+264 ~ RESTART VALUE? 4 


Procedure value of the CPU restart routine eeecidiad by system software. 

‘During the restart process, the console copies this value into R27 before 
transferring control to the CPU restart routine. See Section 3.4. Set to 
zero by the console at system bootstraps. 


+272 RESERVED FOR SYSTEM SOFTWARE? 
Reserved for use by system software. Set to zero by the console at system 
bootstraps. 
+280 _ RESERVED FOR HARDWARE! 
Reserved for use by hardware. 
+288 _ HWRPB CHECKSUM? 


Checksum of all the quadwords of the HWRPB from offset [00] to [118] 
inclusive. Computed as a 64-bit, 2’s complement sum ignoring overflows. 
Used to validate the HWRPB during warm bootstraps, restarts, and sec- 
ondary starts. Set by console initialization; recomputed and updated when- 
ever a HWEPB field with offset [00] to [118] inclusive is modified by the 
console or system software. 


+296 RXRDY BITMASK?24 


Secondary receive bitmask for interprocessor console communications. When 
transmitting a command to a secondary, the primary processor sets the 
RXRDY bit which corresponds to the CPU ID of the secondary. The num- 
ber of active bits in this field is determined by the number of per-CPU slots 
in HWRPB[144]. See Section 2.4. All bits are initialized as clear. 


+304 TXRDY BITMASK?24 


Secondary transmit bitmask for interprocessor console communications. 
When transmitting a message to the primary, the secondary processor sets 
the TXRDY bit which corresponds to its CPU ID and requests an inter- 
processor interrupt to the primary. The number of active bits in this field 
is determined by the number of per-CPU slots in URE ELSA: See Sec- | 
tion 2.4. All bits are initialized as clear. — | 


l[nitialized by the console at cold system bootstrap only. Preserved unchanged by the console at all warm system 
bootstraps. 


2Initialized by the console at all system bootstraps (cold or warm). 
4May be modifed by system software. 





Table 2-1 (Cont.): HWRPB Fields | 
Offset Description 
+(HWRPB[136]) TB HINT BLOCK?4 


Quadword-aligned block that describes the characteristics of the transla- 
tion buffer (TB) granularity hints. See Section 2.1.2. 


+(HWRPB[160]) Per-CPU SLOTS?4 


128 Byte-aligned slots which describe each processor in the system. See 
Section 2.1.3. | 


+(HWRPB[184]) CTB TABLE! 


Quadword-aligned Console Terminal Block Table. Set at console initializa- 
tion; modified by console terminal callbacks. See Section 2.3.8.2. 


+(HWRPB[192]) CONSOLE CALLBACK ROUTINE BLOCK24 


Quadword-aligned block that describes the location and mapping of the 
console callback routines. Set at system bootstrap; modified by console 
FIXUP callback. See Section 2.3.8.1. 


+(HWRPB[200]) MEMDSC!4 


Quadword-aligned Memory Data Descriptor Table. Set at console initial- 
ization; preserved across warm bootstraps. See Section 3.3.1.1. 


lnitialized by the console at cold system bootstrap only. Preserved unchanged by the console at all warm system 
bootstraps. 


2Initialized by the console at all system bootstraps (cold or warm). 
4May be modifed by system software. 


2.1.1 Revision, Type, and Variation Fields 


The HWRPB contains several revision, type, and variation fields which describe the 
Alpha system platform hardware and PALcode. System software uses these fields to 
identify hardware-dependent support code which must be loaded or enabled. These 
fields are examined early in operating system bootstrap; if one of the fields contains a 

value which is unrecognized or incompatible with the operating system, the bootstrap 
attempt fails. Diagnostic software uses these fields to guide field installation and 
upgrade procedures and for material and parts control. 


In multiprocessor systems, the processor type and PALcode revisions need not be 
identical for all processors. System software uses these fields to determine if multi- 
processor operation is viable. This evaluation may be performed by running primary, 
the starting secondary, or a combination of both. For example, see Section 3.3.3.3. 
The fields include: | 


1. HWRPB Revision - HWRPB([16] 


This field identifies the format of the HWRPB. Since the HWRPB is shared be- 
tween the console and system software, both must agree on the field offsets, 
formats, and interpretations. 


2. System Type and System Variation - HWRPB[80] and HWRPB(88] 
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These fields identify the Alpha system platform. System software infers at- 
tributes such as physical address offsets and I/O device locations from the system 
type. | | | | 

3. System Revision - HWRPB[96] 
This field identifies the system platform hardware revision. 

4. Processor Type and Processor Variation - SLOT[176] and SLOT[184] 


These per-CPU slot fields identify each Alpha processor and its capabilities. The 
Processor Type field contains two sub-fields. The major type sub-field identifies 
the processor implementation \ (such as EV-3 or EV—4) \ ; the minor type sub- 
field identifies any system-specific attributes (such as local memory or cache size) 


5. Processor Revision - SLOT[192] 
This per-CPU slot field identifies the processor hardware revision. 
6. PALcode Revision - SLOT[168] | 


This field identifies the PALcode revision required and/or in use by the proces- 
sor. System software uses the PALcode variation and PALcode compatibility 
sub-fields. The variation subfield indicates whether the PALcode image includes 
extensions or functional variations necessary to a given operating system or ap-_ 
plication. 


PROGRAMMING NOTE 
For example, a PALcode variation may contain a dif- 
ferent TB fill routine. System software uses the com- 
patibility subfield to ensure that all processors in 
a multiprocessor system are using compatible PAL- 
code images. 


\ PALcode revisions are specific to the system platform and processor major type. 
The filename of distributed PALcode images must contain sufficient information 
to distinguish the intended system platform and processor. \ 


2.1.2 Translation Buffer Hint Block 


2-10 


The Translation Buffer Hint Block (TBB) contains information on the characteristics 
of the instruction stream translation buffer (ITB) and data stream translation buffer 
(DTB) granularity hints (GH). All processors in a —e Alpha system must | 
implement the same granularity hints. 


The TBB consists of 8 quadwords, 4 for each of the translation buffers (ITB and 
DTB). The 4 quadwords contain 16 word fields; each word contains the number of 
entries in the translation buffer that implement a combination of granularity hints 
(including none). 





D istribution: 





estricted 





Table 2-2: Granularity Hint Fields 
Offset;, Granularity Hint 


0 None 
2 1 page 
4 8 pages 
6 1 and 8 pages 
8 64 pages 
A 1 and 64 pages 
C 8 and 64 pages 
E 1, 8, and 64 pages 
10 512 pages 
12 1 and 512 pages 
14 8, and 512 pages 
16 1, 8 and 512 pages 
18 64 and 512 pages 
1A 1, 64, and 512 pages 
1C 8, 64, and 512 pages 
1E 1, 8, 64, and 512 pages 


2.1.3 Per-CPU Slots in the HWRPB 


Information on the state of a processor is contained in a “per-CPU slot” data structure 
for that processor. The per-CPU slots form a contiguous array indexed by CPU ID. 
The starting address of the first per-CPU slot is given by the offset HWRPB/[160] 
relative to the starting address of the HWRPB. The number of per-CPU slots is 
given in HWRPB([144]. Each per-CPU slot must be 128B aligned to ensure natural 
alignment of the HWPCB at SLOT[0]. The slot size rounded up to the nearest 
multiple of 128 bytes, is given in HWRPB[152]. 


CPU IDs are determined in an implementation-specific manner. The only require- 
ment is that they be in the range of zero to the maximum number of processors the 
particular platform supports minus one. 


SOFTWARE NOTE 


OpenVMS Alpha supports CPU IDs in the range 0-31. 
only. 


Each per-CPU slot contains information necessary to bootstrap, start, restart or halt 
the processor. The format is shown Figure 2—3 and Table 2-3. The HWPCB specifies 
the context in which the loaded system software will execute; see OpenVMS Section, 
Chapter 4 for more information. | 
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The console must initialize the per-CPU slot for the primary processor prior to system 
bootstrap. The per-CPU slot fields for secondary processors are set by a combination 
of the console and system software. The console Gye the halt information at 
error halts and prior to processor restarts. 


Slots corresponding to nonexistent processors are zeroed. There may be more per- 
CPU slots than are necessary in any given Alpha system. A system implementation 
may reserve HWRPB space for processors which are not present at system bootstrap. 


An Alpha system may support internally different, yet software compatible, PAL- 


_ code for different processors in a multiprocessor implementation. Each per-CPU 


slot contains a PALcode memory descriptor which locates the PALcode used by that 
processor. See Section 3.3.1.2 for information on PALcode loading and initialization 
on the primary processor and Section 3.3.3.3 for information on PALcode loading 

and initialization on secondary processors. | 


The starting address of a per-CPU slot is calculated by: 


Slot Address {CPU ID * slot size} + offset + HWRPB base 


= {CPU ID * HWRPB[152]} + HWRPB[160] + #HWRPB 


The address may be physical or virtual. 





Figure 2-3: Per-CPU Slot in HWRPB 


63 0 





A | - Bootstrap/Restart HWPCB A sot 
Per-CPU State Flag Bits +128 
PALcode Memory Length +136 
PALcode Scratch Length +144 
Physical Address of PALcode Memory Space | 14156 
Physical Address of PALcode Scratch Space :+160 
PALcode Revision Required by Processor +168 
Processor Type . +176 
Processor Variation | +184 
_ Processor Revision +192 
Processor Serial Number :+200 
Physical Address of Logout Area +216 
Logout Area Length +224 
Halt PCBB | | 4282 
Halt PC - 14240 
+248 
Halt Argument List (R25) +256 
Halt Return Address (R26) - +264 
Halt Procedure Value (R27) +272 
Reason for Halt +280 
Reserved for Software +288 
¥ Interprocessor Console Buffer Area ae 
v | Reserved for Architecture Use | pee 
+512 
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Table 2-3: Per-CPU Slot Fields 


Offset 
SLOT 


+128 


+136 


+144 


+152 


+160 


Description 


HWPCB?.6 


Hardware Privileged Context Block for this processor. See OpenVMS Sin. 
Chapter 4 for the structure of the HWPCB; see Table 3-6 for the contents as set 
by the console. 


STATE FLAGS?6 
Current state of this processor. See Table 2—4 for the interpretation of each bit. 
PALCODE MEMORY SPACE LENGTH? 


Number of bytes required by this processor for PALcode memory. ner field. 


PALCODE SCRATCH SPACE LENGTH! 4 


Number of bytes required by this processor for PALcode scratch space. Unsigned ? 
field. 


PA OF PALCODE MEMORY SPACE? 


Starting physical address of PALcode memory space for this processor. PALcode 
memory space must be page aligned. See Section 3.3.1.2 or Section 3.3.3.3. 


PA OF PALCODE SCRATCH SPACE? ® 


Starting physical address of PALcode scratch space for this processor. PALcode 
scratch space must be page aligned. See Section 3.3.1.2 or Section 3.3.3.3. 


lInitialized by the console for primary at cold system bootstrap only. Preserved unchanged by the console at all 
other times. 


2Initialized by the console for a secondary at cold system bootstrap only. Preserved unchanged by the console at all 


other times. 


3Initialized by the console for the primary at all system bootstraps (cold or warm) and for a secondary prior to 
processor start. 


SMay by modified by system software for a secondary prior to processor start. 





Table 2-3 (Cont.): Per-CPU Slot Fields 


Offset 
+168 


+176 


+184 


Description 
PALCODE REVISION! 2 


PALcode revision level for this processor. 


Bits Interpretation 


<7:0> PALcode minor revision (0—255) 
<15:8> PALcode major revision (0-255) 
<23:16> PALcode variation 


0 | Reserved 
1 OpenVMS PALcode version 
2 DEC OSF/1 PALcode version 


3-127 Reserved for Digital 
128—255 Reserved for pone. 


<31:24> SBZ 
<47:32> | PALcode compatibility (0-65535) 
QO © Unknown 


1—65535 Compatibility revision 
<63:48> Maximum number of processors that can share this PALcode 
image 


The major and minor PALcode revisions are set at console initialization; the re- 
maining fields are set during PALcode loading and initialization. See Section 2.1.1 
and Section 3.3.3.3. 


PROCESSOR TYPE! 
Type of this processor. 


Bits ‘Interpretation 


<31:0> Minor type 
<63:32> Major type 


Assigned values are summarized in Appendix D; see Section 2.1.1. 
PROCESSOR VARIATION? 2 


Variation or subtype of this processor. Assigned values are summarized in Ap- 
pendix D; see Section 2.1.1. 


1Initialized by the console for primary at cold system bootstrap only. Preserved unchanged by the console at all 


other times. 


2Initialized by the console for a secondary at cold system bootstrap cay. Preserved unchanged by the console at all 


other times. 
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Table 2~—3 (Cont.): Per-CPU Slot pees 


Offset 
+192 


+200 


+216 


+224 


+232 


+240 


+248 


+256 


+264 


+272 


Description _ 
PROCESSOR REVISION}2 


Full DEC STD 12 revision field for this procesz0® This quadword field contains 4 
ASCII characters. See Section 2.1.1. 


PROCESSOR SERIAL NUMBER! 2 


Full DEC STD serial number for this processor. This octaword field contains a 10 
character ASCII serial number determined at the time of manufacture; see DEC 


STD 12 for format information. 


PA OF LOGOUT AREA!# © 


Starting physical address of PALcode logout area for this processor. Logout areas 
must be at least quadword aligned. See OpenVMS Section, Chapter 6. 


LOGOUT AREA LENGTH! 2 


Number of bytes in the PALcode logout area for this processor. See OpenVMS 
Section, Chapter 6. 


HALT PCBB?4 


Value of the PCBB IPR when a processor halt condition is encountered by this 
processor. Initialized to the address of the HWPCB at offset [0] from this pe 
slot at system bootstraps or secondary processor starts. 


HALT PC?:4 


Value of the PC when a processor halt condition is encountered by this processor. 
Zeroed at system bootstraps or secondary processor starts. 


HALT PS?4 


Value of the PS when a processor halt condition is encountered by this acs 
Zeroed at system bootstraps or secondary processor starts. 


HALT ARGUMENT LIST? # 


Value of R25 (argument list) when a processor halt condition is encountered by 
this processor. Zeroed at system bootstraps or secondary processor starts. 


HALT RETURN ADDRESS? 4 


Value of R26 (return address) when a processor halt condition is encountered by 
this processor. Zeroed at system bootstraps or secondary processor starts. 


HALT PROCEDURE VALUE?4 


Value of R27 (procedure value) when a processor halt condition is encountered by 
this processor. Zeroed at system bootstraps or secondary processor starts. 


lJnitialized by the console for primary at cold system bootstrap only. Preserved unchanged by the console at all 
other times. 


2Initialized by the console for a secondary at cold system bootstrap only. Preserved unchanged by the console at all 


other times. 


3Initialized by the console for the primary at all system bootstraps (cold or warm) and for a secondary prior to | 
processor start. | 


4Set by the console at all processor halts. 





— NS 


Table 2-3 (Cont.): Per-CPU Slot Fields 


Offset 
+280 


+288 


+296 


+464 — 


Description 
REASON FOR HALT? 4 
Indicates why this processor was halted. Values include: 


Codejg Reason 


Bootstrap, processor start, or powerfail restart 

Console operator requested a system crash 

Processor halted due to kernel-stack not-valid halt 

Invalid SCBB 

Invalid PTBR | ~ 

Processor executed CALL_PAL HALT instruction in kernel mode 
| Double error abort encountered 

7-FFF Reserved | 


Oo fh WON & © 


other ee ee 


See OpenVMS Section, Chapter 6 for information on system exceptions associated 


with codes 2 through 6. Set to ‘0’ at console initialization. 
RESERVED FOR SOFTWARE® 


Reserved for use by system software. Zeroed at system bootstraps or secondary 


processor starts. 

RXTX BUFFER AREA 

Used for interprocessor console communication. See Section 2.4. 
RESERVED 

Reserved for Digital; SBZ. 


SInitialized by the console for the primary at all system bootstraps (cold or warm) and for a secondary prior to 
processor start. 


4Set by the console at all processor halts. 
®May by modified by system software for a secondary prior to processor start. 


Table 2-4: Per-CPU State Flags 


Bit 
0 


Description 


BOOTSTRAP IN PROGRESS (BIP) 35:6 


For the primary, this bit indicates that this processor is undergoing a system boot- 
strap. For a secondary, this bit indicates that a CPU start operation is in progress. 
Set by the console and cleared by system software. See Sections 3.3.1.4, 3.3.3.6, and 
3.4.1. 


3Initialized by the console for the primary at all system bootstraps (cold or warm) and for a secondary prior to 
processor start. 


5May be modified by system software for the primary. 
6May by modified by system software for a secondary prior to processor start. 
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Table 2—4 (Cont.): Per-CPU State Flags 


Bit Description 

1 RESTART CAPABLE Roe wise 
Indicates that system software executing on this processor is capable of being 
restarted in the event of a detected error halt, powerfail recovery, or other error 
condition. Cleared by the console and set by system software. See Sections 3.3.1.4, 
3.3.3.6, and 3.4.1. | 

2 PROCESSOR AVAILABLE (PA)!4 _ 
This bit indicates that this processor is available for use by septani software. The 
PA bit may differ from the PP bit based on self-test or other diagnostics, or as the 
result of a console command which explicitly sets this processor unavailable. 

3 PROCESSOR PRESENT (PP)! | 
This bit indicates that this processor is physically present in the configuration. | 

4 OPERATOR HALTED (OH)? 4 | 
This bit indicates that this processor is in console I/O mode as the result of explicit — 
operator action. See Section 3.4.8. 

5 CONTEXT VALID (Cv)?:6 
This bit indicates that the HWPCB in this slot is valid. Set after the console or 
system software initializes the HWPCB in this slot. See Sections 3.3.1.2 and 3.3.3. 

6 PALCODE VALID (PV)! | 
This bit indicates that this processor’s PALcode is valid. Set after PALcode has Bagi 
successfully loaded and initialized. See Sections 3.3.1.2 and 3.3.3.3. 

7 PALCODE MEMORY VALID (PMV) 126 
This bit indicates that this processor’s PALcode memory and scratch space addresses 
are valid. Set after the necessary memory is allocated and the addresses are written 
into the processor’s slot. See Sections 3.3.1.2 and 3.3.3.3. 

8 PALCODE LOADED (PL) !7.6 
This bit indicates that this processor ’s PALcode image has been loaded into the 
address given in the processor’s slot PALcode memory space address field. See 
Sections 3.3.1.2 and 3.3.3.3. 

15:9 RESERVED; MBZ. 

l[nitialized by the console for primary at cold system bootstrap only. Preserved unchanged by the console at all 

other times. 

2Initialized by the console for a secondary at cold system bootstrap aly, Preserved unchanged by the console at all 

other times. 

3Initialized by the console for the primary at all system bootstraps Keele: or warm) and for a secondary prior to 

processor start. 


4Set by the console at all processor halts. 


_ 5May be modified by system software for the primary. 


®May by modified by system software for a secondary prior to processor start. 





Table 2—4 (Cont.): Per-CPU State Flags 
Bit Description 
23:16 HALT REQUESTED 35.6 


Indicates the console action requested by system software executing on this proces- 
sor. Values include: 


Codeig Reason 


Default (no specific action) 
SAVE_TERM/RESTORE_TERM exit 
Cold Bootstrap requested 

Warm Bootstrap requested 

Remain halted (no restart) 

other Reserved 


m ON = © 


Set to ‘0’ at system bootstraps and secondary processor starts. May be set to non- 
zero by system software prior to processor halt and subsequent processor entry into 
console I/O mode. See Sections 3.4.7 and 3.3.5. 


63:24 RESERVED; MBZ. 


3Initialized by the console for the primary at all system bootstraps (cold or warm) and for a secondary prior to 
processor start. 


5May be modified by system software for the primary. 
SMay by modified by system software for a secondary prior to processor start. 


2. 1.4 Configuration Data Block 


Systems may have a Configuration Data Block (CONFIG). The format of the block 
and whether it exists in a system is implementation-specific. If present, the block 
must be mapped in the bootstrap address space. The CONFIG Offset in the HWRPB 
(HWRPB[208]) contains the virtual address offset of the block; if no CONFIG block 
exists, the offset is zero. The first quadword of a CONFIG block must contain the | 
size in bytes of the block. The second quadword must contain a checksum for the 
block; the checksum is computed as a 64-bit, 2’s complement sum ignoring overflows. 


2.1.5 Field Replaceable Unit Table 


Systems may have a field replaceable unit (FRU) table. The format of the table and 
whether it exists in a system is implementation-specific. If present, the table must 
be mapped in the bootstrap address space. The FRU Table Offset in the HWRPB © 
(HWRPB[216]) contains the virtual address offset of the table; if no FRU table exists, 
the offset is zero. 


- See the Fault Management Architecture document. 
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The environment variables provide a simply extensible eschaciia for managing — 
complex console state. Such state may be variable length, may change with system 
software, may change as a result of console state changes, and may be established 
by the console presentation layer. Environment variables may be read, written, or 
saved. 


An environment variable consists of an identifier (ID) and a byte stream value main- 
tained by the console. There are three classes of environment variables: 


1. Common to all implementations: ID = 0 to 3F yg. 


These have meaning to both the console and system software. All Alpha consoles 
must implement all of these environment variables. 


2. Specific to a given console implementation: ID = 40 to 7F jg. 


These have meaning to a given console implementation and system software 
implementation. Support for these environment variables is optional. — 


3. Specific to system software: ID = 80 to FF ig. 


These have meaning to a given system software application or implementation; 
the console simply passes these environment variables between the console pre- 
sentation layer and the target application without interpretation. Support for 
these environment variables is optional. 


Optional environment variables, if any, supported by a given console must be detailed 
in the relevant console implementation specification and registered with the Alpha 
architecture group. See Appendix E. \ 


The value, format, and size of each environment variable is dependent on the en- 
vironment variable and the console implementation. The size of an environment | 
variable value is specified in bytes. The byte stream value of most environment 


variables consists of an ASCII string. Some environment variable values consist of 


multiple fields, some environment variable values consist of lists. Values are parsed 
as follows: : 


1. Each field is delimited by one and only one space "" 20;¢. 

2. Each list element is delimited by one and only one comma “,” 2Cjg. 

3. Any numeric quantities are expressed in hexadecimal. 

4. All characters are case-blind and may be expressed in uppercase or lowercase. 


Examples of environment variables which have list values are BOOT_DEV, BOOTED _ 
OSFLAGS, and DUMP_DEV. | 


PROGRAMMING TEXT 


‘or example, BOOT_DEV mignt consist of “0 4 MSCP,0 1 MOP” 
and BOOT_OSFLAGS might consist of “7,2,1C”. 


co 





Appendix E summarizes the format and lengths of the environment variables for 
each implementation. 


System software uses the console environment variable routines to access the en- 
vironment variables. Each environment variable is identified by an identification 
number (ID). If the console resolves the ID, the associated byte stream value is re- 
turned. The console environment variable routines present system software with a 
consistent interface to environment variables regardless of the presentation layer 
and internal console representation. The console operator interacts with the console 
presentation layer to access environment variables. See Section 1.3 for details. 


In a multiprocessor system, the console must ensure that the dynamic state created 
by the environment variables is common to all processors. It must not be possible for 
a value observed on a secondary to differ from that observed on the primary or an- 
other secondary. This is necessary to support a restarting a processor, 
and switching the primary. 


Some environment variables contain critical state which must be maintained across 
console initializations and system power transitions. Other environment variables — 
contain dynamic state which must be initialized at console initialization and retained 
across warm bootstraps. Still others contain dynamic state which is initialized at 
each system bootstrap. See Section 2.5.2. 


Environment variable values which must be maintained across console initializa- 
tions must be retained in some sort of non-volatile storage. Default values for these 
environment variables must be set prior to system shipment. Thus, there are three 
possible values: the dynamic value, the default value retained in non-volatile stor- 
age, and the initial default value set in non-volatile storage prior to system shipment. 
The console need not preserve the initial default value. If console implementation 
preserves the initial default value, that value is accessible only to the console pre- 
sentation layer; system software accesses only the dynamic and default (last writ- 
ten) values. The dynamic and default values may differ at any time after console 
initialization as the result of changes by system software or the console operator. 


The internal representation and implementation mechanisms of environment vari- 
ables is at the complete discretion of the console and is unknown to both system soft- 
ware and the console presentation layer. The realization of the required non-volatile 
storage is also implementation specific. 


Table 2—5 lists the environment variables maintained by the console. Each environ- 
ment ID is also assigned a symbolic name which is used to reference the environment 
variable elsewhere in this specification. 
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Table 2-5: Required Environment Variables 
Environment Var 


ID;, Symbol Description | 
00 | Reserved | : | 
01 AUTO_ACTION? 2 Console action following an error halt or powerup. nennes 


values and the action invoked are: 


— “BOOT” (544F 4F421¢) bootstrap 
— “HALT” (544C 41483,) halt 
- “RESTART” (64 5241 5453 4552y¢) restart 


Any other value causes a halt; The default gains when 
the system is shipped is “HALT” (544C 41481¢). See Sec- 
tion 3.1.1. 


02 BOOT_DEV? Device list used by the last (or currently in progress) boot- 
strap attempt. The console modifies BOOT_DEV at con- 
sole initialization and when a bootstrap attempt is initi- 

ated by a BOOT command. The value of BOOT_DEV is 
set from the device list specified with the BOOT command 
or, if no device list is specified, BOOTDEF_DEV. The con- 
sole uses BOOT_DEV without change on all bootstrap at- 
tempts which are not initiated by a BOOT command. See 
Section 3.3.1.5. The format is independent of the console | 
presentation layer; registered formats are contained in Ap- 
pendix E. 


03 BOOTDEF_DEV! Device list from which bootstrapping is to be attempted 
when no path is specified by a BOOT command. See Sec- 
tion 3.3.1.5. The format follows BOOT_DEV. The default 
value when the system is shipped indicates a valid implementation- 
specific device or NULL 004g. 


04 BOOTED_DEV‘4 Device used by the last (or currently in progress) bootstrap 
attempt. Value is one of the devices in the BOOT_DEV list. 
See Section 3.3.1.5. The format is independent of the con- 
sole presentation layer; registered formats are contained in 
Appendix E. 


05 BOOT_FILE!# Filename to be used when a bootstrap requires a filename 
and when the bootstrap is not the result of a BOOT com- 
mand or when no filename is specified on a BOOT com- 
mand. The console passes the value between the console 
presentation layer and system software without interpre- 
tation; the value is preserved across warm bootstraps. The 
default value when the system is shipped is NULL 004g. 


1Non-volatile. The last value saved by system software or set by console commands is preserved across system 
initializations, cold bootstraps, and long power outages. 


“Warm non-volatile. The last value set by system software is preserved across warm bootstraps and restarts. 
“Read-only. The variable cannot be modified by system system software or console commands. 
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Table 2—5 (Cont.): Required Environment Variables 
Environment Var 
IDig Symbol Description 


06 BOOTED_FILE4 Filename used by the last (or currently in progress) boot- 
| strap attempt. The value is derived from BOOT_FILE or 
the the current BOOT command. The console passes the 
value between the console presentation layer and system 
software without interpretation. 


07 BOOT_OSFLAGS! 4 Additional parameters to be passed to system software 
when the bootstrap is not the result of a BOOT command or 
when none are specified on a BOOT command. The console 
preserves the value across warm bootstraps and passes the 
value between the console presentation layer and system 
software without interpretation. The default value when 
the system is shipped is NULL 00j¢. 


08 BOOTED_OSFLAGS‘ Additional parameters passed to system software during 
the last (or currently in progress) bootstrap attempt. The 
value is derived from BOOT_OSFLAGS or the current BOOT 
command. The console passes the value between the con- 
sole presentation layer and system software without inter- 
pretation. 


09 BOOT_RESET! Indicates whether a full system reset is performed in re- 
| sponse to an error halt or BOOT command. Defined values 
and the action invoked are: 


— “OFF” (46 464F),¢) warm bootstrap, no full system reset 
is performed. 


— “ON” (4E4Fj¢) cold bootstrap, a full system reset is per- 
formed. | 


See Sections 3.3.1 and 3.3.2. The default value when the 
system is shipped is implementation-specific. 


0A DUMP_DEV!2 Device used to write operating system crash dumps. The 
format follows BOOTED_DEV and is independent of the 
console presentation layer; registered formats are contained 
in Appendix E. The value is preserved across warm boot- 
straps. The default value when the system is shipped in- 
dicates an implementation-specific device or NULL 004. 


1Non-volatile. The last value saved by system software or set by console commands is preserved across system 
initializations, cold bootstraps, and long power outages. 


“Warm non-volatile. The last value set by system software is preserved across warm bootstraps and restarts. 
“Read-only. The variable cannot be modified by system system software or console commands. 
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Table 2-5 (Cont.): Required Environment Variables 


Environment Var 
IDi¢g Symbol 
OB ENABLE_AUDIT! 


oC LICENSE! 4 


0D CHAR_SET!# 


OE LANGUAGE! 2 


OF TTY_DEV 124 


10-3F 
40-—7F 
80—FF 


Description 


Indicates whether audit trail messages are to be generated 
during bootstrap. Defined values and the action invoked 
are: — 


“OFF” (46 464F1¢). Audit trail messages suppressed. 
— “ON” (4E4Fj¢). Audit trail messages generated. 


The default value when the system is shipped is “ON” 
(4E4F)¢.) 


Software license in effect. The value is derived in an. 


implementation-specific manner during console initializa- 
tion. Defined values and (optional) software interpretation 
are: 3 7 


— “MU” (554Dj¢) multiple user system. 
— “SU” (555316) single user system. 


\ Note that the mechanism used to derive the value of LI- 
CENSE should NOT be documented in customer-available 
literature. \ 


Current console terminal character-set encoding. Defined 
values are given in Table 2-7. The default value when the 
system is shipped is determined by the manufacturing site. 


Current console terminal language. Defined values are 
given in Table 2-6. The default value when the system 
is shipped is determined by the manufacturing site. 


Current console terminal unit. Indicates which entry of the 
CTB Table corresponds to the actual console terminal. The 
value is preserved across warm bootstraps. The default 
value is “0” 3046. 


Reserved for Digital. 
Reserved for console implementation use. 


Reserved for system software use. 


1Non-volatile. The last value saved by system software or set by console commands is preserved across system 
initializations, cold bootstraps, and long power outages. 


2Warm non-volatile. The last value set by system software is preserved across warm bootstraps and restarts. 
“Read-only. The variable cannot be modified by system system software or console commands. 





Table 2-6: Supported Languages 


LANGUAGE,, Language Character-Set ae 
0 none (cryptic) ISO-LATIN—1 1 

30 Dansk ISO-LATIN-1 1 

32 Deutsch ISO-LATIN—-1 1 

34 Deutsch (Schweiz) ISO-LATIN—1 1 

36 English (American) ISO-LATIN-1 1 

38 English (British/Irish) ISO-LATIN-1 1 
3A Espanol | ISO-LATIN-1 1 

3C Francais ISO-LATIN—-1 1 

3E Francais (Canadian) ISO-LATIN-1 jl 

40 Francais (Suisse Romande) ISO-LATIN-1 1 

42 Italiano ISO-LATIN-1 1 

44 Nederlands ISO-LATIN—1 1 

46 Norsk ISO-LATIN-1 1 

48 Portugues ISO-LATIN—-1 1 
4A Suomi ISO-LATIN-1 1 
4C Svenska ISO-LATIN—1 1 
4E Vlaams ISO-LATIN-1 1 
other reserved TBD TBD 


Table 2-7: Supported Character Sets 


CHAR _SET;, Character-Set 
0 ISO-LATIN—1 
TBD 


other 


2.3 Console Callback Routines 


System software can access certain system hardware components through a set of 
callback routines provided by the Alpha console. These routines give system software 
an architecturally consistent and relatively simple interface to those components. 


All of the console callback routines may be used by system software when the op- | 
erating system has only restricted functionality, such as during bootstrap or crash. 
When invoked in this context, the console may assume full control of system platform 
hardware. Some of the console callback routines may be used by system software 
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when the operating system is fully functional. Such usage imposes constraints on 
the console implementation. 


All routines must be called by system software executing in kernel mode. All routines. 
require that the HWRPB and the per-CPU, CTB, and CRB offset blocks are virtually 


mapped and kernel read/write accessible. If these conditions are not met, the results 
are UNDEFINED. Some of the routines execute correctly only at or above certain 
IPLs. 


The routines must never modify any processor registers except hase explicitly indi- 


- cated by the routine descriptions. 


2.3.1 System Software Use of Console Callback Routines | 


2-26 


Those console callback routines which are intended for use while the operating sys- 
tem is fully functional execute in the unmodified context of that operating system. 
The console must not usurp operating system control of system platform hardware. 
These routines must: 


Not alter the current IPL. 

Not alter the current execution mode. 

Not disable or mask interrupts. | 

Not alter any registers except as explicitly defined by the routine interface. 
Not alter the existing memory management policy. 


Not usurp any existing interrupt mechanisms. 


oo oe oe NS UP 


Be interruptable. 
8. Ensure timely completion. 


Once the operating system is bootstrapped, the console must not reclaim resources 
transferred to that operating system. This includes both the issuing and servicing 
of I/O device interrupts, interprocessor interrupts, and exceptions. 


It is the responsibility of the console implementation to ensure that these console 
callback routines may be invoked at multiple IPLs, may be interrupted, and may be 
invoked by multiple system software threads. The operation of these routines must 
appear to be atomic to the calling system software even if that software thread is 
interrupted. See Section Section 2.5.3.1. 


In a multiprocessor system, some console routines may be invoked only on the pri- 
mary processor. A secondary processor may invoke only a subset of these routines 
and then only under a limited set of conditions. These conditions are explicitly stated 
in the routine descriptions; if violated, the results are UNDEFINED. 





ea 


2.3.2 System Software Invocation of Console Callback Routines 


With the exception of the FIXUP routine, all of the routines are accessed uniformly 
through a common DISPATCH procedure. The target routine is identified by a func- 
tion code. Ail console callback routines are invoked using the Alpha standard calling 
conventions. 


Any memory management exceptions generated by incorrect mapping or inaccessi- 
bility of console callback routine parameters are serviced by the operating system. 
This occurs naturally for those console callback routines which are intended for use 
while the operating system is fully functional; these routines execute in the unmodi- 
fied context of that operating system. For those routines intended for use only while 
the operating system has restricted functionality, the DISPATCH-routine must en- 
sure that any mapping or accessibility conflicts are resolved prior to permitting the 
console to gain control of the system platform hardware. 

2.3.3 Console Callback Routine Summary 
The console callback routines fall into four functional groups: 
1. Console terminal interaction. 
2. Generic I/O device access. 
3. Environment variable manipulation. 
4. Miscellaneous. 
The hexadecimal function code, name, and function for each routine are summarized 
in Table 2-8. 


Table 2-8: Console Callback Routines 
Code;, Name Function Invoked 


Console Terminal Routines 


01 GETC Get character from console terminal 
02 PUTS | Put byte stream to console terminal 
03 RESET_TERM Reset console terminal to default 

04 SET_TERM_INT Set console terminal interrupts 

05 SET_TERM_CTL Set console terminal controls 

06 PROCESS_KEYCODE Process and translate keycode 

07-F | reserved _ 
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Table 2-8 (Cont.): Console Callback Routines : 
- Code;g Name Function Invoked 


Console Generic I/O Device Routines 


10 OPEN Open I/O device for access 
11 CLOSE | Close I/O device for access 
12 IOCTL © | Perform I/O device-specific operations 
13 READ | - ‘Read I/O device 
14 WRITE Write I/O device. 
15-1F | | reserved 
Console Environment Variable Routines 
20 SET_ENV Set (write) an environment variable 
21 RESET_ENV Reset (default) an environment variable 
22 GET_ENV Get (read) an environment variable 
23 SAVE_ENV Save current environment variables 
Console Miscellaneous Routines 
30 PSWITCH Switch primary processor 
(none) FIXUP Remap console callback routines 
(none) DISPATCH Access console callback routine 


other reserved 


All Alpha consoles must implement: 


1 
2 
3. 
4 


All console terminal routines except PROCESS_KEYCODE. 
All console generic I/O device routines. 

All environment variable routines except SAVE_ENV. 

The FIXUP and DISPATCH miscellaneous routines. 


The PSWITCH routine is required for all Alpha multiprocessor ayeleme which sup- 
port dynamic primary switching. See Section 3.4.6. 


2.3.4 Console Terminal Routines 
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_ Alpha consoles provide system software with a consistent interface to the console 


terminal, regardless of the physical realization of that terminal. This interface con- 
sists of the Console Terminal Block (CTB) Table and a number of console terminal 
routines. Each CTB contains the characteristics of a terminal device which can be 
accessed through the console terminal routines; see Section 2.3.8.2. 





There is ONLY ONE console terminal. The CTB Table may contain multiple CTBs 
and the console terminal routines may be used to access multiple terminal devices. 
Each terminal device is identified by a “unit number” which is the index of its CTB 
within the CTB Table. The TTY_DEV environment variable indicates the unit, hence 
the CTB, of the console terminal. The console terminal unit is determined at system 
bootstrap and cannot be altered by system software. Console terminal device inter- 
rupts are delivered at IPL 20 to the primary processor; interrupts can be redirected | 
to a secondary only when switching the primary processor. 


The console terminal routines permit system software to access the console terminal 
in a device-independent way. These routines may be invoked while the operating 
system is fully functional as well as during operating system bootstrap or crash. All 
console terminal routines are subject to the constraints given in Section Section 2.3.1. 
These routines must: 


1. Not alter the current IPL or current mode. 


These routines must be invoked in kernel mode at or above the console terminal | 
device IPL 20. | 


2. Not alter the existing memory management policy. 
All internal pointers must have been remapped by FIXUP. 
3. Not block interrupts. 


The operating system must be capable of continuing to receive hardware inter- 
rupts at higher IPLs. 


4. Be interruptable and re-entrant. 


These routines may be invoked at multiple IPLs and their execution may be in- 
terrupted. Note, however, that console terminal callback operations are not nec- 
essarily atomic. In the event of re-entrant invocations, it is UNPREDICTABLE 
whether or not the interrupted operation will fail and characters may be trans- 
mitted or received out of order. 


The time required for console terminal routines to iste: is UNPREDICTABLE; 
however, a console Peace aon will attempt to minimize the time whenever 
possible. | 


SOFTWARE NOTE 
To permit use of these routines by OpenVMS, implemen- — 
tations must limit the execution time to significantly 
less than the interval clock interrupt period. A return 
after partial operation completion 1 is preferable to long 
latency. 


When invoking these routines, system software must: 


1. Be executing in kernel mode at or above the console terminal device IPL 20. 
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If these routines are invoked in other modes, their execution causes UNPRE- 
DICTABLE operation. If invoked at lower IPLs, their execution causes UNDE- 
FINED operation. 


Be executing on the primary processor in a sabia configuration. 


If these routines are invoked on secondary processors, their execution causes 
UNDEFINED operation. 


Be prepared to service any resulting console terminal interrupts, if enabled. 


System software must provide valid interrupt service routines for the console 


terminal transmit and receive interrupts. The operating system interrupt service 
routines must be established prior to enabling interrupts; otherwise the operation 


of the system is UNDEFINED. 


PROGRAMMING NOTE 

Any console terminal interrupt service routines es- — 
tablished by the console prior to transferring con- 
trol to operating system software are not transferred 
to the operating system nor are they remapped by 
FIXUP. Any console terminal interrupts will be de- 
livered only after the operating system lowers IPL 
from the console terminal device IPL. 


IMPLEMENTATION NOTE | 
The implementation of console terminal I/O inter- 
rupts are specific to system hardware platform. An 
example of implementation-specific characteristics in- 
clude console terminal SCB vectors. 





2.3.4.1 GETC - Get Character from Console Terminal 


Format: 
char = DISPATCH ( GETC,unit ) 
Inputs: 
GETC = R16; GETC function code - 011. 
unit = R17; terminal device unit number 
arginfo = R25; argument information 
retadr = R26; return address 
procval = R27; procedure value 
Outputs: 
char = RO; returned character and status: 
RO0<63:61> ‘000’ success, character received 
001’ success, character received, more 
to be read 


100’ failure, character not yet ready 
for reception 


110’ failure, character received with er- 
ror 
‘LLY’ failure, character received with er- 


ror, more to be read 
R0<60:48>  device-specific error status 
R0<47:40> SBZ 
R0<39:32> terminal device unit number returning char- 
| acter 
R0<31:0> character read from console terminal 


GETC attempts to read one character from a console terminal device and, if success- 
ful, returns that character in RO0<31:0>. The character is not echoed on the terminal 
device. The size of the returned character is from one to four bytes and is a func- 
tion of the current character-set encoding and language, see Table 2—6. The routine 
performs any necessary keycode mapping. 


For implementations which support multiple directly addressable terminal devices, 
R17 contains the unit number from which to read the character. If the implemen- 
tation does not support multiple terminal devices or if the devices are not directly 
addressable, R17 SBZ. The unit number from which the character was read is re- 
turned in R0<39:32>. If the implementation does not support multiple terminal 
devices, RO<39:32> is returned as zero. 
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GETC returns character reception status in RO<63:61>. If received characters are — 


_ buffered by the console terminal, RO<61> is set ‘1’ whenever additional characters 


are available. If GETC returns a character without error, R0<63:62> is set to ‘00’. 
If no character is yet ready, RO<63:62> is set to ‘10’. If an error is encountered 
obtaining a character, R0<63:62> is set to ‘11’; examples of errors during character 

reception include data overrun or loss of carrier. | 


- When an error is returned by GETC, the contents of R0<31: O> and R0<60:48> de- 


pend on the capabilities of the underlying hardware. Implementations in which the 
hardware returns the character in error must provide that character in R0<31:0>. 
Additional device-specific error status may be contained in R0<60:48>. See the ap- 
propriate CTB description 1 in Appendix E. | 


When appropriate, GETC performs special keyboard operations such as s turning on 
or off keyboard LEDs. Such action is based on the incoming stream of keycodes 


- delivered by the console terminal. See the appropriate device CTB description i in 


Appendix E for more details. 
The return address indicated by R26 should be mapped and kernel executable. 


Distribution 





2.3.4.2 PROCESS KEYCODE - Process and Translates Keycode 


Format: 


char 


Inputs: 


PROCESS_KEYCODE = R16; 


unit 
keycode 


again 


arginfo 
retadr 
procval 


Outputs: 


char 


= DISPATCH( PROCESS_KEYCODE, unit, keycode, again) 


PROCESS_KEYCODE function code - 064, 
terminal device unit number 

Keycode to be processed 

‘1’ if calling again for same keycode 

‘0’ otherwise 

argument information 

return address 


procedure value 


translated character and status: 


RO0<63:61> ‘000’ success, character returned 

‘101’ failure, more time needed 
to process keycode 

‘110’ failure, device not sup- 
ported by routine or rou- 
tine not supported 

‘111’ failure, no character - more 
keycodes needed or ille- 
gal sequence encountered 


RO<60> ‘0’ success in correcting se- 
vere error 

‘Vl failure in correcting se- 
vere error 


RO0<59:32> SBZ 
RO0<31:0> translated character 


PROCESS_KEYCODE attempts to translate the keycode contained in R18 and, if 
successful, returns the character in R0<31:0>. The translation is based on the cur- 
rent character-set encoding, language, and console terminal device state contained in 
the appropriate CTB. The translated character may be from one to four bytes. For 
implementations which support multiple terminal devices, R17 contains the unit 
number of the keyboard; R17 SBZ otherwise. 
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IMPLEMENTATION NOTE | 
For ISO-LATIN—1 character-set encoding, PROCESS_ 
KEYCODE returns a one byte character; see Section 2.5.3.2.1. 


PROCESS_KEYCODE returns keycode translation status in RO<63:61>. The pro- 
cessing falls into one of several cases: 


1. 


The keycode, along with previous keycodes if any, translates into a character 
from the currently selected character-set. In this case, RO<63:61> set to ‘000’. 


The keycode, along with previously entered keycodes if any, does not translate 
into a character from the currently selected character-set. This is because either: 


e there are not yet enough keycodes entered to produce a character in the 
currently selected character-set 


e the keycodes entered to this point indicate a severe keyboard error status 


¢ the keycodes entered to this point form an illegal or unsupported keycode 
sequence In this case, RO<63:61> set to ‘111’. 


The console terminal device for which keycode translation is being performed 
is not supported by the PROCESS_KEYCODE implementation or the console 
implementation does not support PROCESS_KEYCODE. In this case, RO<63:61> 
set to ‘110’. ar 


The keycode cannot be processed in a reasonable amount of time; multiple invo- 
cations of PROCESS_KEYCODE are necessary. In this case, the routine returns 
with RO0<63:61> set to ‘101’. The subsequent call(s) should be made with the 
same keycode in R18 and R19 set to ‘I’. 


IMPLEMENTATION NOTE 

It may not be possible for an implementation to 
perform all the actions associated with special key- 
codes (such as turning on LEDs) in a timely manner. 
The PROCESS_KEYCODE routine must return af- 
ter partial operation completion if necessary. It is 
the responsibility of the console to ensure that sub- 
sequent calls make forward progress. The delay be- 
tween successive operating system calls is UNPRE- 
DICTABLE, although the operating system should 
attempt to complete the operation in a timely fash- 
ion. See Sections 2.3.4 and 2.5.3.1. 


In all but the first case, the contents of RO0<31:0> are UNPREDICTABLE. 


When certain severe keyboard errors are encountered, PROCESS_KEYCODE at- 
tempts to correct them by performing special keyboard operations. Those severe er- 
rors which may be corrected are device-specific and contained in the terminal device 
CTB. If an error is encountered and the attempt to correct the error is unsuccessful, 


RO0<60> set to ‘1’; otherwise RO<60> set to ‘0’. 





The keyboard state recorded in the CTB is updated appropriately as the input stream 
of keycodes is processed. If appropriate, PROCESS _ KEYBOARD may buffer some 
of the keycodes in the CTB keycode buffer. The supported keyboard state changes 
are device-specific and are listed in the device CTB. 


The return address indicated by R26 should be mapped and kernel executable. 
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2.3.4.3 PUTS - Put Stream to Console Terminal 
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Format: 


weount 


inputs: 


PUTS 
unit 
address 
length 
arginfo 
retadr 
procval 


Outputs: 


wceount 


DISPATCH ( PUTS, unit,address,length ) 


R18; 
R19; 
R25; 
R26; 
R27; 


RO; 


PUTS function code - 02 i¢ 

terminal device unit number 

virtual address of byte stream to be written 
count of bytes to be written | 


argument information 


return address 


procedure value 


count of bytes written and status: 


R0<63:61> ‘000’ 
‘OO’ 
100’ 
‘110’ 


111’ 


success, all bytes written 
success, some bytes written 
failure, no bytes written, termi- 
nal not ready 

failure, no bytes written, termi- 
nal error encountered 

failure, some bytes written, ter- 
minal error encountered 


RO0<60:48>  device-specific error status 


R0<47:32> . SBZ 


RO0<31:0> count of bytes written (unsigned) 


PUTS attempts to write a number of bytes to a console terminal device. R18 contains 
the base virtual address of the memory-resident byte stream; R19 contains its 32-bit 
size in bytes. The byte stream is written in order with no interpretation or special 
handling. The count of the bytes transmitted is returned in RO<31:0>. 


PROGRAMMING NOTE 


For multiple byte character-set encodings, the returned 
byte count may indicate a partial character transmis- 
sion. , 


For implementations which support multiple terminal devices, R17 contains the unit 
number to which the byte stream is to be written; R17 SBZ otherwise. 





PN 


PUTS returns byte stream transmission status in R0<63:61>. If only a portion of the 
byte stream was written, RO<61> is set to ‘1’. If no error is encountered, RO<63:62> is 
set to ‘00’. If no bytes were written because the terminal was not ready, RO<63:62> 
is set to ‘10’. If an error is encountered writing a byte, RO<63:62> is set to ‘11’; 
examples of errors during byte transmission include data overrun or loss of carrier. 


When an error is returned by PUTS, additional device-specific error status may be 
contained in R0<60:48>.. See the appropriate CTB description in Appendix E for 
more details. 


Multiple invocations of PUTS may be necessary because the console terminal may 
accept only a very few bytes in a reasonable period of time. 


The output byte stream located by R18 should be mapped and kernel read accessible; 
the return address indicated by R26 should be mapped and kernel executable. 
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2.3.4.4 RESET_TERM - Reset Console Terminal to default parameters 


Format: 
status = DISPATCH ( RESET TERM, unit ) 
Inputs: 


RESET_TERM= R16; RESET_TERM function code - 031¢ 


unit = R17; terminal device unit number 
arginfo = R25; argument information 
retadr = R26; return address 
procval = R27; procedure value 
Outputs: 
status = RO; status: 
RO<63> ‘0’ _ success, terminal reset 


TV failure, terminal not fully reset 
RO0<62:0> SBZ : 


RESET_TERM resets a console terminal device and its CTB to their initial, default 
state. All errors in the CTB are cleared. For implementations which support multi- 
ple terminal devices, R17 contains the unit number to be reset; R17 SBZ otherwise. 


The CTB describes the capabilities of the terminal device and its initial, default state. 
Depending on the terminal device type and particular console implementation, other 
terminal devices may be affected by the routine. 


PROGRAMMING NOTE 
For example, if multiple terminal units share a common 


interrupt, that interrupt may be disabled or enabled for 
all. 


If the console terminal is successfully reset, RESET_TERM returns with RO<63> set 
to ‘0’. If errors are encountered, the routine attempts to return the console terminal 
to a usable state and then returns with R0<63> set to ‘1’. | 


The return address indicated by R26 should be mapped and kernel executable. 
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2.3.4.5 SET _TERM_CTL - Set Console Terminal Controls 


Format: 


status = DISPATCH ( SET TERM CTL, unit,ctb ) 


Inputs: 


SET_TERM_CTL= R16; SET_TERM_CTL function code - 0546 


unit = R17; terminal device unit number 
cth = R18; virtual address of CTB 
arginfo = R25; argument information 
retadr = R26; _ return address 
procval = R27; procedure value 
Outputs: 
status = RQ; status: 
— RO<63> ‘0’ success, requested change com- 
pleted | 


‘1’ failure, change not completed 
R0<62:32> SBZ 
RO<31:0> offset to offending CTB field (unsigned) 


SET_TERM_CTL, if successful, changes the characteristics of a console terminal de- 
vice and updates its CTB. The changes are specified by fields contained in a CTB 
located by R18. The characteristics which can be changed, hence the active CTB 
fields, depend on the console terminal device type; see the appropriate CTB descrip- 
tion in Appendix E. For implementations which support multiple terminal devices, 
R17 contains the unit number to be reset; R17 SBZ otherwise. | 


If the console terminal characteristics are successfully changed, SET_TERM_CTL 
returns with R0<63> set to ‘0’. If errors are encountered or if the terminal device 

_ does not support the requested settings, the routine attempts to return the device 
to the previous usable state and then returns with RO<63> set to ‘l’ and RO<31:0> 
set to the offset of an offending or unsupported field in the CTB located by R18. 
Regardless of success or failure, the device CTB Table entry always contains the 
current device characteristics upon routine return. SET_TERM_CTL returns the 
CTB located by R18 without modification. 


The CTB located by R18 should be mapped and kernel read accessible; the return 
address indicated by R26 should be mapped and kernel executable. 
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Restricted 


2.3.4.6 SET_TERM_INT - Set Console Terminal Interrupts 


_ Format: 
status 


Inputs: 


SET_TERM_INT = R16; 


unit 


- mask 


arginfo 
retadr 
procval 


Outputs: 


status 


= DISPATCH ( SET_TERM_INT,unit,mask ) 


SET_TERM_INT function code - 0416 
terminal device unit number _ 


bit encoded mask: | 


~R18<1:0> ‘01’ nochange to transmit interrupts 


‘00’ disable transmit interrupts | 
‘1X’ enable transmit interrupts 


— R18<7:2> SBZ 


R18<9:8> 01’ no change to receive interrupts 
‘00’ disable receive interrupts 
‘1X’ enable receive interrupts 
R18<63:10> SBZ 


argument information 
return address 


procedure value 


status: 


RO<63> ‘0’ — success 
‘l’ failure, operation not supported 
RO0<62:2> SBZ 


R0<0> ‘l= transmit interrupts enabled 
‘0’ transmit interrupts disabled 
RO<1> ‘1’ ~— receive interrupts enabled. 


‘0’ ~—s receive interrupts disabled 


SET_TERM_INT reads, enables, and disables transmit and receive interrupts from 
a console terminal device and updates its CTB. For implementations which sup- | 
port multiple terminal devices, R17 contains the unit number to be reset; R17 SBZ 


otherwise. 


If the interrupt settings are successfully changed, the routine returns with R0<63> 
set to ‘0’. If the terminal device does not support the requested setting, then the 
routine returns with RO<63> set to ‘1’. 
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| PROGRAMMING NOTE | 
For example, a device which has a unified transmit 
/receive interrupt would would not support a request to 
enable transmit interrupts while leaving receive inter- 
rupts disabled. 


Regardless of success or failure, the routine always returns with the previous set- 
tings in RO<1:0>. The current state of the interrupt settings can be read without 
change by invoking SET_TERM_INT with R18<1:0> and R18<9:8> set to ‘OV’. 


The return address indicated by R26 should be mapped and kernel executable. 
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2.3.5 Console Generic I/O Device Routines 
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The Alpha console provides primitive generic I/O device routines for system software 


- use during the bootstrap or crash process. These routines serve in place of the more 


sophisticated system software I/O drivers until such time as these drivers can be 
established. These routines may also be used to access console-private devices which © 
are not directly accessible by the processor. 


During the bootstrap process, these routines can be used to acquire a secondary 
bootstrap program from a system bootstrap device. .!or write messages to a terminal 
other than the logical console terminal. When the operating system is about to crash, 
these routines can be used to write dump files. 


These routines are NOT intended for use while the operating system is fully func- 


tional. These routines may: 


1. Alter the current IPL. 


The console may raise, but not lower, the IPL for the duration of the routine 
execution. | 


2. Block interrupts. 


These routines may cause any and all interrupts to be blocked or delivered to 
and serviced by the console for the duration of the routine execution. 


3. Block exceptions. 


These routines may cause any and all exceptions to blocked or delivered to and 
serviced by the console for the duration of the routine execution. 


4. Alter the existing memory management policy. 


The console may substitute a console-private (or bootstrap address) mapping for 
the duration of the routine execution. 


PROGRAMMING NOTE 
The console must resolve any virtually addressed ar- 
guments prior to altering the existing memory man- 
agement policy. 


5. Take any length of time for completion. 


The operating system has no timeliness guarantee when invoking these routines. 
Any operating system timer may have expired by their return. The time nec- 
essary for completion is UNPREDICTABLE; however, a console implementation 
will attempt to minimize the time whenever possible. 


Prior to returning to the invoking system software, these routines must restore any 
altered processor state. These routines must return to the calling system software 
at the IPL and in the memory management policy of that software. 


System software invokes these routines synchronously. When invoking these rou- 
tines, system software must: 





1. Be executing in kernel mode. 


If these routines are invoked in other modes, their execution causes UNPRE- 
DICTABLE operation. 


2. Be executing on the primary processor in a multiprocessor configuration. 


If these routines are invoked on other processors, their execution causes UNDE- 
FINED operation. | 
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2.3.5.1 CLOSE - Close Generic I/O Device for Access 


Format: 
status 
inputs: 


CLOSE 
channel 
arginfo 
retadr 
procval 


Outputs: 


status 


= R16; 
= R17; 
= R25; 
= R26; 
= R27; 


DISPATCH ( CLOSE,channel ) 


RQ; 


CLOSE function code - 11j¢ 
channel to close 

argument information 
return address 


procedure value 


status: 


R0<63> ‘0’ ~—s success 
‘l’ failure 
R0<62:60> SBZ 
RO0<59:32> device-specific error status 
RO0<31:0> SBZ 


CLOSE deassigns the channel number from a previously opened block storage stor- 
age I/O device. The channel number is free to be reassigned. The I/O device must 
be reopened prior to any subsequent accesses. 


CLOSE returns status in RO<63>. If the channel was open and the close is successful, 
R0<63> is set to ‘0’; otherwise RO<63> is set to ‘1’ and additional device-specific status 


is recorded in RO<62:32>. 


For magnetic tape devices, CLOSE does not affect the Sapient tape position nor is 
any rewind of the tape performed. 


The return address indicated by R26 should be mapped and kernel executable. 
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2.3.5.2 IOCTL - Perform Device-specific Operations 


Format: | 


count 


Inputs: 


IOCTL 
channel 
arginfo 
retadr 
procval 


= DISPATCH ( IOCTL, channel,R18,R19,R20,R21 ) 


= R16; 
= R17; 
= R25; 
= R26; 
= R27; 


IOCTL function code - 1246 

channel number of device to be accessed 
argument information 

return address 


procedure value 


For Magnetic Tape Devices Only: 


operate 


count 


Outputs: 


= R20- 


R21 


tape positioning operation: | 
‘01’ for SKIP to next/previous Inter-Record Gap 
‘02’ = for SKIP over Tape Mark 

03’ for REWIND 

04’ for write Tape Mark 

number of SKIPs to perform (signed) 


Reserved for future use as inputs 


For Magnetic Tape Devices Only: 


count 


= RO; 


number of skips performed and status: 


R0<63:62> ‘00’ success 
‘10’ failure, position not found 
‘ll hardware failure 
R0<61:60> SBZ 
R0<59:32> device-specific error status 
R0<31:0> number of SKIPs actually performed (signed) 


IOCTL performs special device-specific operations on I/O devices. The operation 
performed and the interpretation of any additional arguments passed in R18 - R21 
are functions of the device type as designated by the channel number passed in R17. 


For magnetic tape devices, the following operations are defined: 





Restricted 
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1. ‘Ol’ - IOCTL relocates the current tape position by skipping over a number of » 
inter-record gaps. The direction of the skip and the number of gaps skipped 
is given by the signed 32-bit count in R19. Skipping with a count of ‘0’ does 
not change the current tape Pontion. The number of gaps actually eee is 
returned in R0<31:0>. 


2. ‘02’-IOCTL relocates the current tape position by skipping over a 1 number of fae 
marks. The direction of the skip and the number of marks skipped is given by 
the signed 32-bit count in R19. Skipping with a count of ‘0’ does not change the 
current tape position. The number of tape marks ace skipped i is returned in 
RO<31:0>. | 


3. ‘03’ - IOCTL rewinds the tape to the paetion just after the Beginning-Of-Tape 
(BOT) marker. RO<31:0> is returned as SBZ. 


4, ‘04’- IOCTL writes a tape mark starting at the current position. RO<31:0> is 
returned as SBZ. 


IOCTL returns magnetic tape operation status in R0<63:62>. If the operation was 
successful, RO<63:62> is set to ‘00’. If the tape positioning was not successful, the 
tape is left at the position where the error occurred and R0<63:62> is set to ‘10’. 
Tape positioning may fail due to encountering a BOT marker (R18 ‘01’ or ‘02’), en- 
countering a tape mark (R18 ‘01’), or running off the end of the tape. If a hardware 
device error is encountered, the final position of the tape is UNPREDICTABLE and 
RO<63:62> is set to ‘11’. In the event of an error, additional device-specific status is 
recorded in R0<61:32>. 


The return address indicated by R26 should be appeal and kernel executable. 
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2.3.5.3 OPEN - Open Generic I/O Device for Access 


Format: 

channel = DISPATCH ( OPEN, devstr,length ) 
Inputs: 

OPEN = R16; OPEN function code - 10; 

devstr = R17; starting virtual address of byte string which contains the 

device specification 

length = R18; length of byte string 

arginfo = R25; argument information 

retadr = R26; return address 

procval = R27; procedure value 
Outputs: 

channel =R0; assigned channel number and status: 


RO<63:62> ‘00’ success 
‘10’ failure, device does not exist 
‘ll’ failure, error - device cannot be ac- 
| cessed or prepared . 
R0<61:60> SBZ 
R0<59:32> device-specific error status 
R0<31:0> assigned channel] number of device 


OPEN prepares a generic I/O device for use. by the READ and WRITE routines. R17 
contains the base virtual address of a byte string which specifies the complete device 
specification of the I/O device. The length of the string is given in R18. The format 
and contents of the device specification string follows that of the BOOTED_DEV 
environment variable; see Appendix E. 


The routine assigns a unique channel number to the device. The channel number is 
returned in RO and must be used to reference the device in subsequent calls to the 
READ, WRITE, and CLOSE routines. 


OPEN returns status in R0<63:62>. If the I/O device exists and can be prepared for 
subsequent accesses, RO<63:62> is set to ‘00’. If the device does not exist, RO<63:62> 
is set to ‘10’. If the device exists, but errors are encountered in preparing the device, 
R0<63:62> is set to ‘11’ and additional device-specific status is recorded in RO<61:32>. 
In the latter two failure cases, the channel number returned in RO<31:0> is UNPRE- 
DICTABLE. 
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All Gonsale implementations must support at least two concurrently opened generic 


I/O devices. Additional generic I/O devices may be eae 


PROGRAMMING NOTE | 
See the relevant console ee a specification 
and Appendix E. _ | 


For magnetic tape devices, OPEN does not affect the sunpent tape steal nor is 
any rewind of the tape performed. : 


Multiple channels cannot be assigned to the same device; the second and any sub- 
sequent calls to OPEN fail with R0<63:62> set to ‘11’ and R0<31:0> as UNPRE- 
DICTABLE. The status of the first opened channel i is unaffected. 


The input string located by R17 should be mapped and kernel read accessible; the 
return address indicated by R26 should be mapped and kernel executable. 





2.3.5.4 READ - Read Generic I/O Device 


Format: 
recount = DISPATCH ( READ, channel, count,address,block ) 
inputs: 
READ = R16; READ function code - 13;¢ 
channel = R17; channel number of device to be accessed 
count = R18; number of bytes to be read (should be multiple of the 


address = R19; 


block = R20; 

arginfo = R25; 

retadr = R26; 

procval = R27; 
Outputs: 

recount = RO; | 


device’s record length) (unsigned) 
virtual address of buffer to read data into 


logical block number of data to read (used only by disk 
devices) | 


argument information 
return address 


procedure value 


number of bytes read and status: 


RO0<63> ‘0’ ~—s success 
‘l’ failure 
R0<62> ‘l’ EOT or Logical End of Device condi- 
| tion encountered 
‘0’ otherwise 
R0<61> ‘Y’— illegal record length specified 
‘0’ otherwise | 
RO0<60> ‘l’ ‘run off end of tape 
‘0’ otherwise 
RO<59:32> device-specific error status 
RO<31:0> number of bytes actually read (unsigned) 


READ causes data to be read from the generic I/O device designated by the channel 
number in R17 and written to a memory buffer pointed to by R19. The 32-bit 
transfer byte count, hence length of the buffer, is contained in R18. The buffer must 
be quadword aligned, virtually mapped, and resident in physical memory. 


READ returns transfer status in R0<63:60> and the number of bytes actually read, 
if any, in RO<31:0>. If the routine is successful, RO<63> is set to ‘0’. If an error 
is encountered accessing the device, RO<63> is set to ‘1’. Additional device-specific 
status may be returned in R0<59:32>. 
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The transfer byte count should be a multiple of the record length of the device. Ifthe _ 
specified byte count is not a multiple of the record length, RO<61> is set to ‘1’. If the 
count exceeds the record length, the count is rounded down to the nearest multiple 
of the record length and READ attempts to read that number of bytes. If the record 
length exceeds the count, it is UNPREDICTABLE whether READ oe to access 
the device. If no read attempt is made, RO<63> is set to ‘1’. 


For magnetic tape devices; READ does not interpret the tape format nor differentiate | 
between ANSI formatted and unformatted tapes. The routine simply reads the 
requested transfer byte count starting at the current tape position. READ terminates 
when either: | 


1. The specified number of bytes have been read. In this case, RO<63:60> is set to 
‘0000’. 


2. An inter-record gap is encountered. In this case, the tape is positioned to the | 
next position after the gap and R0<63:60> is set to ‘0000’. 


3. A tape mark is encountered. In this case, tape is positioned to the next position 
after the tape mark and R0<63:60> is set to ‘0100’. (Note that after calling READ 
and finding a tape mark, the caller can determine if the logical End-Of-Volume 
or an empty file section has been found by calling READ again. The condition 
exists if the second READ returns with zero bytes read and a tape mark found.) 


4. The routine runs off the end of tape. In this case, RO<63:60> is set to ‘1001’. 
READ ignores End-Of-Tape (EOT) markers. 


For disk devices, READ does not understand the file structure of the device. The 
routine simply reads the requested transfer byte count starting at the logical block 
number specified by R20. The transfer continues until either the specified number 
of bytes has been read or the last logical block on the device has been read. If the 
logical end of the device is encountered, then R0<63:62> is set to ‘Ol’. 


For network devices, READ interprets and removes any device-specific or protocol- 
specific packet headers. If a packet has been received, the remainder of the packet is 
copied into the specified buffer. If a packet has not been received, the routine returns 
with R0<31:0> set to ‘0’. Only those network packets which are specifically addressed 
to this system and are of the specified protocol type are returned; broadcast packets 
are not returned. The actual packet size is dependent on the device and protocol; 
the characteristics of the network device and protocol are specified at the time of the 
channel OPEN. 


The buffer pointed to by R19 should be mapped and kernel write accessible; the 
return address indicated by R26 should be mapped and kernel executable. 
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2.3.5.5 WRITE - Write Generic I/O Device 


Format: 


wocount _ 


Inputs: 


WRITE 
channel 


count 


address 
block 


arginfo 
retadr 
procval 


Outputs: 


weount 


= R18; 


= R265; 


DISPATCH ( WRITE, channel, count, address,block ) 


WRITE function code - 141, 

channel number of device to be accessed 

number of bytes to be written (should be multiple of the 
device’s record length) (unsigned) 

virtual address of buffer to read data from 

logical block number of data to be written (used only by 
disk devices) 

argument information 

return address 


procedure value 


number of bytes written and status: 


R0<63> ‘0’ ~—s success 
a ‘l= failure 
RO0<62> ‘Y’ EOT or Logical End of Device condi- 
tion encountered 
‘ otherwise 
RO<61> ‘l’ = illegal record length specified 
‘O’ otherwise 
RO<60> ‘lif run off end of tape 
‘0’ _— otherwise 
RO<59:32> device-specific error status 
RO<31:0> number of bytes actually written (unsigned) 


WRITE causes data to be written to the generic I/O device designated by the channel 
number in R17 and read from to a memory buffer pointed to by R19. The 32-bit 
transfer byte count, hence length of the buffer, is contained in R18. The buffer must 
be quadword aligned, virtually mapped, and resident in physical memory. 


WRITE returns transfer status in R0<63:60> and the number of bytes actually writ- 
ten, if any, in RO<31:0>. If the routine is successful, RO<63> is set to ‘0’. If an error 
is encountered accessing the device, RO<63> is set to ‘1’. Additional device-specific 
status may be returned in R0<59:32>. 
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The transfer byte count should be a multiple of the record length of the device. If the 
specified byte count is not a multiple of the record length, RO<61> is set to ‘1’. If the 
count exceeds the record length, the count is rounded down to the nearest multiple 
of the record length and WRITE attempts to write that number of bytes. If the 
record length exceeds the count, it is UNPREDICTABLE whether WRITE a a 


_ to access the device. If no write attempt is made, RO<63> is set to ‘I’. 


For magnetic tape devices, WRITE does not interpret the tape format nor differen- | 


tiate between ANSI formatted and unformatted tapes. The routine simply writes 
the requested transfer byte count starting at the current tape pee WRITE 
terminates when either: 


1. The specified number of bytes have been written without Seeding an End-Of- 
Tape (EOT) marker. In this case, RO<63:60> is set to ‘0000’. 


2. The specified number of bytes have been written and an End-Of-Tape (EOT) 
marker was detected. In this case, RO<63:60> is set to ‘0100. _ 


3. The routine runs off the end of tape. In this case, RO<63:60> is set to ‘1001’. 


For disk devices, WRITE does not understand the file structure of the device. The 
routine simply writes the requested transfer byte count starting at the logical block 
number specified by R20. The transfer continues until either the specified number 
of bytes has been written or the last logical block on the device has been written. If 
the logical end of the device is encountered, then R0O<63:62> is set to ‘Ol’. 


For network devices, WRITE appends any device-specific or protocol-specific head- 
ers. The routine transmits the specified requested transfer bytes with the proper 
network protocol over the appropriate network. The actual packet size is dependent 
on the device and protocol; the characteristics of the network device and protocol are 
specified at the time of the channel OPEN. 


The buffer pointed to by R19 should be mapped and kernel write accessible; and the 
return address indicated by R26 should be mapped and kernel executable. 





2.3.6 Console Environment Variable Routines 


System software accesses the environment variables indirectly through console call- 
back routines. These routines may be invoked while the operating system is fully 

functional as well as during operating system bootstrap or crash. The GET_ENYV, 

SET_ENV, and RESET_ENV routines are subject to the constraints given in Sec- 
tion 2.3.1. These routines must: 


1. Not alter the current IPL or current mode. 
These routines must be invoked in kernel mode. 
2. Not alter the existing memory management policy. 
All internal pointers must be remapped by FIXUP. 
3. Not block interrupts. 


The operating system must be capable of continuing to receive hardware and 
software interrupts. 


The constraints on SAVE_ENV differ; see Section 2.3.6.3. 


_ The time necessary for these routines to complete is UNPREDICTABLE; however, 
a console implementation will attempt to minimize the time whenever possible. 


SOFTWARE NOTE 
To permit use of these routines by OpenVMS, implemen- 
tations must limit the execution time to significantly 
less than the interval clock interrupt period. 


The console implementation must ensure that any access to an environment variable 
is atomic. The console implementation must resolve multiple competing accesses by 
system software as well as competing accesses by system software and the console 
presentation layer. See Section 2.5.3.1. 


When invoking these routines, system software must be executing in kernel mode. If 
these routines are invoked in other modes, their execution causes UNPREDICTABLE 
operation. 


These routines may be invoked on both the primary and secondary processors in 
a multiprocessor configuration. System software is recommended to serialize com- 
peting accesses to a given environment variable; a stale value may be returned if 
GET_ENV is invoked simultaneously with SET_ENV or RESET_ENV. 
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2.3.6.1 GET_ENV - Get an environment variable 


Format: 


status 


Inputs: 


GET_ENV = R16; 


ID 


value 


length 
arginfo 
retadr 
‘procval 


Outputs: 


status 


= DISPATCH ( GET_ENV,ID,value,length ) 


= R19; 
= R25; 
= R26; 
= R27; 


GET_ENV function code - 2216 
ID of environment variable 


starting virtual address of byte stream to contain re- 
turned value | 


number of bytes in byte stream (unsigned) 
argument information 
return address 


procedure value 


status: 


RO<63:61> ‘000’ success 
‘OOl’ _—ssuccess, byte stream truncated 
‘110’ failure, variable not recognized 
R0<60:32> SBZ 
RO0<31:0> count of bytes returned (unsigned) 


GET_ENV causes the value of the environment variable specified by the ID in R17 
to be returned in the byte stream specified by the virtual address in R18. The size 
in bytes of the byte stream is contained in R19. : 


GET_ENV returns status in R0<63:61>. If the environment variable is recognized, 
R0<63:62> is set to ‘00’, its current value is copied into the byte stream, and R0<31:0> 
is set to the number of bytes copied. If the value must be truncated, RO<61> is set 
to ‘1’. If the variable is not recognized, RO<63:61> is set to ‘110’ and RO<31:0> is set 


to ‘0’. 


The byte stream indicated by R18 should be mapped and kernel write accessible; 
_ the return address indicated by R26 should be mapped and kernel executable. 
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2.3.6.2 RESET_ENV - Reset an environment variable 


Format: 


status = DISPATCH ( RESET _ENV,ID,value,length ) 


Inputs: 


RESET_ENV = R16; RESET_ENV function code - 214, 


ID = R17; ID of environment variable 
value = R18; _ starting virtual address of byte stream to contain re- 
turned value 
length = R19; number of bytes in byte stream (unsigned) 
arginfo = R25; argument information 
retadr = R26; return address 
procval = R27; procedure value 
Outputs: 
status = RO; status: 


RO<63:61> ‘000’ — success 

‘001’ ~—s success, byte stream truncated 

‘100’ _— failure, variable read-only 

101’ failure, variable read-only, byte stream 

| truncated 

‘110’ failure, variable not recognized 
RO0<60:32> SBZ 
RO<31:0> count of bytes returned (unsigned) 


RESET_ENV causes the environment variable specified by the ID in R17 to be reset 

to the system default value and that default value to be returned in the byte stream 
specified by the virtual address in R18. The size in bytes of the byte stream is 

contained in R19. | 


RESET_ENV returns status in R0O<63:61>. If the environment variable is success- 
fully reset to the default value, RO<63:62> is set to ‘00’. If the variable is recognized. 
but read-only, the value is unchanged and R0<63:62> is set to ‘10’. In both cases, 
the default value is copied into the byte stream and R0<31:0> is set to the number 
of bytes copied; if the value must be truncated, RO<61> is set to ‘1’. If the variable 
is not recognized, RO<63:61> is set to ‘110’ and R0<31:0> is set to ‘0’. 


The byte stream indicated by R18 should be mapped and kernel write accessible; 
the return address indicated by R26 should be mapped and kernel executable. 
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2.3.6.3 SAVE_ENV - Save current environment variables 
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Format: 
status _ = DISPATCH ( SAVE_ENV ) 
Inputs: 


SAVE_ENV =R16; SAVE_ENV function code - 231, 


arginfo = R25; argument information 

retadr = R26; return address 

procval = R27; procedure value 
Outputs: 

status = RO; status: 


RO<63:61> ‘000’ success, all values saved 
‘OOl’ success, some bytes saved, addi- 
tional values to be saved 
‘110’ _— failure, routine unsupported 
‘111’ _—s failure, error encountered saving 


values 
RO<60:0> SBZ 


SAVE_ENV attempts to update the non-volatile storage of those environment vari- 
ables which must be retained across console initializations and system power tran- 
sitions. These environment variables are identified as “NV” in Table 2-5. 


| PROGRAMMING NOTE 

For example, SAVE_ENV may cause an EEPROM to be 
updated. That update may write all “NV” environment 
variable values to the EEPROM, or may only write those 
variables which have been modified since the last update 
or console initialization. 


This routine is not subject to the constraints given in Section 2.3.6. The console may 
usurp operating system control of the system platform hardware, but must restore 
any such control or altered state prior to return. The console must not service any 


interrupts or exceptions which are otherwise intended for the operating system. 


The non-volatile storage update may take significant time and multiple invocations 
of SAVE_ENV may be necessary. The time necessary for this routine to complete 
is UNPREDICTABLE. A console implementation will attempt to minimize the time 
whenever possible and must return in a timely fashion. The routine must return 
after partial operation completion if necessary. It is the responsibility of the console 





to ensure that subsequent calls make forward progress. The operating system may 
delay for extended periods between subsequent calls; the console must not rely on 
timely invocations of SAVE_ENV. 


IMPLEMENTATION NOTE 
To permit use of these routines by OpenVMS, implemen- 
tations must lhmit the execution time to significantly 
less than the interval clock interrupt period. A return 
after partial operation completion is preferable to long 
latency. 


SAVE_ENV returns status on the update in R0<63:61>. When the update has suc- 
cessfully completed and all relevant variables have been saved, the routine returns 
with RO<63:61> set to ‘000’. If SAVE_ENV returns after only a partial update to 
ensure timely response, RO<63:61> set to ‘O01’. If an unrecoverable error is en- 
countered, the the routine returns with RO<63:61> set to ‘111’. The contents of the 
non-volatile storage are UNDEFINED. 


Implementation of SAVE_ENV is optional. If the console does not support SAVE_ 
ENV, the routine returns with R0<63:61> set to ‘110’. 


On a multiprocessor system with an embedded console, the routine must be invoked 
on each processor in the configuration. See Section 3.7.3. 


System software is recommended to ensure that calls to SET_ENV or RESET_ENV 
are not issued while an update operation is in progress on any processor. It is 
UNPREDICTABLE whether the updated environment value is saved. 


The return address indicated by R26 should be mapped and kernel executable. This 
routine does not affect the current value of any environment variable maintained by 
the console. 
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2.3.6.4 SET_ENV - Set an environment variable 


Format: 


status | = DISPATCH ( SET _ENV,ID,value,length ) 


Inputs: 


SET ENV =R16; SET ENV function code - 204¢ 


ID = R17; ID of environment variable | 
value = R18; starting virtual address of byte stream containing value 
length = R19; number of bytes in byte stream (unsigned) 
arginfo = R25; argument information 
retadr §§ = R26; return address 
procval = R27; procedure value 
Outputs: 
status = RO; status: 


RO<63:61> ‘000’ success 
‘100’ _—‘ failure, variable read-only 
‘110’ failure, variable not recognized — 
‘lll’ _— failure, byte stream exceeds value 
length 
RO<60:31> SBZ 
RO0<31:0> maximum value length (unsigned) 


SET_ENV causes the environment variable specified by the ID in R17 to have the 
value specified by the byte stream value pointed to by the virtual address in by R18. 
The size in bytes of the byte stream is contained in R19. 


SET_ENV returns status in R0<63:61>. If the environment variable is successfully 
set to the new value, R0<63:61> is set to ‘000’. If the variable is not recognized, 
RO0<63:61> is set to ‘110’. If the variable is read-only, the value is unchanged and 
RO0<63:61> is set to ‘100’. If the input byte stream exceeds the maximum value 
length, the value is unchanged and R0<63:61> is set to ‘111’. In all cases, the 
maximum value length is returned in RO<31:0>. 


The byte stream indicated by R18 should be mapped and kernel read accessible; the 
return address indicated by R26 should be mapped and kernel executable. 
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2.3.7 Miscellaneous Routines 
2.3.7.1 FIXUP - Fixup virtual addresses in console routines 


Format: 


status = FIXUP ( NEW BASE VA, HWRPB VA ) 


Inputs: 


NEW_BASE_VA= R16; New starting virtual address of the console callback 


routines 
HWRPB_VA -=R17; New starting virtual address of the HWRPB 
arginfo = R25; argument information 
retadr = R26; return address 
procval = R27; procedure value 
Outputs: 
status = RO; status: 


RO<63> ‘0’ — success 
‘1’ failure 
R0<62:0> SBZ 
FIXUP adjusts virtual address references in all other console callback routines using 
the new starting virtual address in R16, the new starting virtual address of the 


HWRPB in R17, and the current contents of the CRB. See Section 2.3.8.1.2 for a full 
description of FIXUP usage and functionality. 


If FIXUP is successful, it returns with R0<63> set to ‘0’. If FIXUP is not successful, 
console internal state has been compromised. The console attempts a cold bootstrap 
if the state transition in Figure 3—1 indicates a bootstrap and the BOOT_RESET 
environment variable is set to “ON” (4E4Fj,). Otherwise, the system remains in 
console I/O mode. 


_ This routine must be called in kernel mode and in the context of the existing mem- 
ory mapping; otherwise its execution causes UNPREDICTABLE or UNDEFINED 
operation. | 


| SOFTWARE NOTE | 
FIXUP is generally called while the bootstrap address 
space mapping is in effect. 


The return address indicated by R26 should be mapped and kernel executable. 
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2.3.7.2 PSWITCH - Switch Primary Processors 


Format: 
status = DISPATCH ( PSWITCH,action ) 


Inputs: 


PSWITCH = R16; PSWITCH function code - 30i¢ 
action = R17; action requests: | 
R17<1:0> ‘01’ transition from primary 
‘10’ transition to primary 
| ‘11’ ~—s switch primary 
R17<63:2> SBZ 


cpu_id = R18; new primary CPU ID 
arginfo = R25; argument information 
_retadr = R26; return address 
procval = R27; procedure value 
Outputs: 


status = RO; _ status: 


R0<63> ‘0’ — success 
‘l failure, operation not supported 
RO<62:0> implementation-specific error status 


PSWITCH attempts to perform any implementation-specific functions necessary to 
support primaryness switching. R17 indicates the requested primary transition ac- 
tion. R18 contains the CPU ID (WHAMI IPR) of the new primary. 


PSWITCH is invoked by the old primary, the secondary which is to become the 
new primary, or both. See Section 3.4.6 for a full description of PSWITCH usage, 
functionality, and error returns. 


If PSWITCH is successful, it returns with RO<63> set to ‘0’. If PSWITCH is unsuc- 
cessful for any reason, it returns with RO<63> set to ‘I’ and implementation-specific 
status in RO<62:0>. 


PSWITCH is invoked at IPL 31. The return address indicated by R26 should be 
mapped and kernel executable. 


pution 





2.3.8 Console Callback Routine Data Structures 


The console and system software share two data structures which are necessary for 
the console callback routines. These are the Console Routine Block (CRB) and the 
Console Terminal Block (CTB) Table. Both are located by offset fields in the HWRPB 
as show in Figure 2-4. 


The CRB locates all addresses necessary for console callback routine function. The 
base physical address of the CRB is obtained by adding the CRB OFFSET field at 
HWRPB[192] to the base physical address of the HWRPB. The CRB format is shown 
in Figure 2—5 and described in Table 2-9. 


The CTB Table contains information necessary to describe the console terminal de- 
vices. The base physical address of the CTB Table is obtained by adding the CTB 
TABLE OFFSET field at HWRPB[184] to the base physical address of the HWRPB. 
The CTB format is shown in Figure 2—6 and described in Table 2-10. 


Figure 2-4: Console Data Structure Linkage 


[ ] :HWRPB [ 


[ ] 
[Offset to CTB] : 
[ ] 


[Offset to CRB] : 
: [VA of DISPATCH Procedure Value] :CRB J 
, [PA of DISPATCH Procedure Value] 
[VA of FIXUP Procedure Vaiue ] 
{Procedure Descriptor 1st Quadword] [PA of FIXUP Procedure Value 


] 

[VA of DISPATCH Entry } [Number of Entries in Map ] 
[Number of Pages in Map ] 

[Virtual/Physical Map ] 

[DISPATCH Procedure] , = 


2.3.8.1 Console Routine Block 


Prior to transferring control to system software, the console ensures that the console 
callback routines, console-private data structures, and associated local I/O space 
locations are mapped into region 0 of initial bootstrap address space. All necessary 
pages are located by the Console Routine Block (CRB). 
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Figure 2-5: Console Routine Block 


63 — 


| Virtual Address of DISPATCH Procedure Descriptor :-CRB 








Physical Address of DISPATCH Procedure Descriptor +08 


ee 









Virtual Address of FIXUP Procedure Descriptor +16 


x 
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Virtual Address for Entry Last 
Physical Address for Entry Last | _ 


Page Count for Entry Last 
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Table 2-9: CRB Fields 
Offset Description 


CRB DISPATCH VA - The virtual address of the procedure descriptor for the DISPATCH — 


procedure. 

+08 — DISPATCH PA - The physical address of the procedure descriptor for the DIS- 
PATCH procedure. | 

+16 aa VA - The virtual address of the procedure descriptor for the FIXUP proce- 

ure. 

+24 FIXUP PA - The physical address of the procedure descriptor for the FIXUP pro- 
cedure. 

+32 ENTRIES - The number of entries in the virtual-physical map. Unsigned integer. 

+40 PAGES - The total number of physical pages to be mapped. Unsigned integer. 

+48 ENTRY - Each entry identifies a collection of physically contiguous pages to be 


mapped. Each map entry consists three quadwords: 


Offset Name Description | 

+00 ENTRY_VA Base virtual address for entry 

+08 ENTRY_PA | Base physical address for entry 

+16 ENTRY_PAGES Number of contiguous physical pages to be mapped. 
Unsigned integer. 


The CRB must be quadword aligned. The DISPATCH and FIXUP addresses must be 
quadword aligned; all unused bits SBZ. The ENTRY addresses must be page aligned 
and all unused bits SBZ. 


The DISPATCH and FIXUP procedure asser iors located by DISPATCH_PA, DIS- 
PATCH_VA, FIXUP_PA and FIXUP_VA must be contained within the pages located 
by the first virtual-physical map entry. 


2.3.8.1.1 Console Routine Block Initialization 


Prior to transferring control to system software, the console initializes all fields of 
the CRB. The console fills in all physical and virtual address fields, the number 
of entries in the virtual-physical map (ENTRIES), the total number of pages to be 
mapped (PAGES), and the virtual addresses contained in the procedure descriptors 
for the DISPATCH and FIXUP procedures’. PAGES is the sum of the contents of 
all ENTRY_PAGES fields. 


All addresses are initially mapped within region 0 of the initial ideas address 
space. These addresses include the contents of the CRB and all addresses contained 


1 Recall from the Alpha calling standard, that the second quadword of a procedure descriptor contains the entry address 
(virtual) of the procedure itself. 
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within the DISPATCH and FIXUP procedure descriptors. The mapping must permit 
kernel access with appropriate read/write/execute access. Note that the KRE, KWE, 


~ and FOx PTE fields are never subsequently altered by system software. The initial | 


mapping need not be virtually pease: 


2.3.8.1.2 Console Routine Remapping 


When the console transfers control to the system software, the sensei callback rou- 
tines may be invoked by the system software without additional setup. All neces- 
sary virtual mappings into initial bootstrap address space must be ne by the 
console prior to transferring control. 


The system software may virtually remap the console callback routines. This remap- 
ping permits the system software to relocate the routines to virtual addresses other 
than those assigned in initial bootstrap address space. This relocation requires that 
the console adjust (or fixup) various internal virtual address references. 


The system software invokes the FIXUP routine to enable the console to perform 
the necessary internal relocations. The FIXUP routine virtually relocates all console 
routines and adjusts any console-private virtual address pointers such as those used 
to locate a local I/O device or HWRPB data structure. Note that if system software 
virtually remaps the HWRPB, FIXUP must be invoked prior to calling any other 
console callback routine; it is recommended that system software remap both the 
HWRPB and the console routines together!. Calling the console callback routines 
after the HWRPB has been remapped from its original bootstrap address location 
results in UNDEFINED operation of the system. 


To remap the console callback routines, the system software and the console cooper- 
ate as follows: 


1. System software must be executing on the DEEEY processor in a pea prOceeeoe 
system. 


2. System software determines the new base virtual address of the HWRPB; this 
remapping is optional. System software does not perform any remapping of the 
~ HWRPB at this step. 


Note that system software need not remap the memory data descriptor table 
located by HWRPB[200]. See Section 2.1 for a description of the HWRPB and its 
size. 


3. System software determines the new base virtual address of the console callback 
routines. The CRB entries will be mapped into a set of virtually contiguous 
pages. The CRB PAGES field (CRB[40]) is used to determine the number of 
pages that must be mapped. System software does not perform any remapping 
of the console callback routines at this step. 


4, System software passes control to the console by calling FIXUP (NEW_BASE_VA, 
NEW_HWRPB_VA). NEW_BASE_VA is the new base virtual address as estab- 


i Note that if the HWRPB is remapped but subsequently returned to its original bootstrap address location, the routines 
may be successfully invoked after the return of the HWRPB to its original remapping without calling FIXUP. 
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lished in step 3. HWRPB_VA is the new starting virtual address of the HWRPB 
as established in step 2. 


5. The console first locates the HWRPB, then locates the CRB using the CRB OFF- 
SET field. The console then locates all internal pointers and adjusts them. All 
linkage sections and other console-internal pointers must be modified. These 
data structures can be located during FIXUP because the initial bootstrap ad- 


dress space mopEMe is in effect; any console-internal pointers are valid until 
modified. : 


Note that system software need not remap the optional CONFIG Block or FRU 
Table located by HWRPB OFFSET fields. If these blocks will be subsequently 
used by the console, they must be located by console-internal pointers and those 
pointers must be modified during FIXUP. 


DISPATCH and FIXUP are not uniquely remapped by the system software. The 
FIXUP must update the DISPATCH and FIXUP procedure descriptors located by 
CRB[8] and CRB{24]. The physical pages containing the procedure descriptors 
and the routines themselves must be included in the virtual-physical map. 


Lastly, note that the relative virtual address offsets of the pages located by the 
entry map are not guaranteed to be retained across the FIXUP. The initial boot- 

- strap address mapping of the physical pages located by the entry map is not 
required to be virtually contiguous. The system software remapping is required 
to be virtually contiguous. Any offsets which cross physical pages may have to 
be modified by FIXUP. 


6. The console returns from FIXUP. If the FIXUP was not successful, console in- 
ternal state has been compromised. The console attempts a cold bootstrap if the 
state transition in Figure 3—1 indicates a bootstrap and the BOOT_RESET en- 
vironment variable is set to “ON” ld as Otherwise, the system remains in 
console I/O mode. 


7. System software updates each virtual-physical map entry of the CRB: 


1. The PTE and TB entries corresponding to the range of old virtual address 
are invalidated using the old ENTRY_VA and ENTRY_PAGES values. 


2. The new starting virtual address is written into the ENTRY_VA. This virtual 
address is computed by adding the NEW_BASE_ VA to the sum of the PAGE_ 
COUNTs of each preceding entry. 


3. New PTEs are constructed for each physical page. The new PTE FOx and 
protection fields are copied from the original bootstrap address PTE. 


| PROGRAMMING NOTE 
Note that it is the responsibility of the console 
to judiciously set both the protection and FOx 
bits in the bootstrap address PTE. In particular, 
if the console sets the FOE bit, there is no ar- 
chitectural guarantee that the console exception 
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handler will gain control nor any obvious appro- 
priate response for the operating system handler. 


8. System software updates. the DISPATCH and FIXUP VAs. The first virtual- 
physical map entry locates the a page which contains the DISPATCH and 
FIXUP procedure descriptors. 


9. System software updates all PTEs and invalidates all appropriate TB entries 
associated with the remapped HWRPB and any remapped OFFSET blocks. 


At the completion of this process, the console callback routines are remapped and 
may again be used by system software. Note that since FIXUP itself is relocated, 
system software may remap the routines more than once. 


Console Terminal Block Table 


The Console Terminal Block (CTB) Table indicates the current identity and charac- 
teristics of each console terminal device. The CTB Table is the only data structure 
shared by the console and system software which describes the terminal devices 
accessible by console callback routines. : 


The CTB Table contains an array of CTBs. Each CTB is a quadword-aligned struc- 
ture with format as shown in Figure 2-6 and described in Table 2-10. The index 
of the CTB in the CTB Table is the unit number of the terminal device. The CTB 
format consists of two parts: a header and a device-specific segment. The format 
of the header is common to all CTBs; the format of the device-specific segment is 
dependent on the unique device type. Appendix E contains the specification of all 
registered CTB formats. | 


There is ONLY ONE console terminal. The console terminal unit is selected by 
the console presentation layer prior to bootstrapping the operating system; see Sec- 
tion 1.3. Once the operating system is bootstrapped, the console terminal unit should 
not be changed by the console presentation layer. Any attempt to do so results in 
UNDEFINED operation of the console. Specifically, if the console presentation layer | 
halts the operating system, alters the console terminal unit, then restarts or contin- 
ues operating system execution, the operation of the console is UNDEFINED. The 
console terminal unit is identified by the TTY_DEV environment variable. 


During console initialization, the console: 

Locates all console terminal devices. 

Selects the console terminal. | 

Builds a CTB for each. 

Initializes the CTB OFFSET field of the HWRPB. 
Initializes each console terminal device. | 


Records the default state of each console terminal device in its CTB. 


I fg a -— Oo NY HE 


Records the unit number of the console terminal in the TTY_DEV environment 
variable. 





Whenever the console changes the state of a console terminal device, the console 
must update its CTB to reflect the change. The console may record extended status | 
on character transfers (GETC/PUTS) in the CTB. 


System software uses the CTB to determine console terminal] device characteristics. 
System software never directly modifies the contents of a CTB; such modifications 
can result in UNDEFINED operation of the console terminal device either as the 
result of a subsequent call to a console terminal routine or as the result of a console | 
internal need to access a console terminal device (e.g. as the result of a halt). Sys- 
tem software calls the SET_TERM_CTL console terminal routine to change console 
terminal device characteristics. 


Figure 2-6: Console Terminal Block 
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Table 2—10: CTB Fields 


Offset 
CTB 


+08 


+16 


+24 


+32 


Description 


DEVICE TYPE - Console terminal device type and format of the device-specific 
segment. Defined device types are: 


Type Description 


QO No console present 

1 Detached service processor 

2 Serial line UART 

3 Graphics display with LK keyboard connected to serial line UART 


other Reserved 


DEVICE ID - The physical device and channel which sends and receives the console 
terminal stream. This field is necessary for configurations which include multiple- 
channel devices or multiple single-channel devices. The field has two subfields: 


Bits Description 


<63:32> Device index 
<31:0> Channel index 


For implementations which support only a single directly-connected console ter- 
minal device, this field is set to zero. Note that the device ID is not necessarily 
related to the console terminal device unit number. 


RESERVED - This field is reserved for future expansion and may not be used by 
the console or system software. 


DSD LENGTH - This field specifies the number of bytes in the device-specific data 
field, DSD. 


DSD - This field contains device-specific data associated with the unique console 
terminal type. Device-specific data may include such parameters as baud rate, 
flow control is enables, and the current state of the CAPS LOCK key. The DSD 
field should contain only those items which are must be shared between the console 
and system software. 


2.4 Interprocessor Console Communications 
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Only those communications between a running processor and a console processor are 
considered here. Communications paths between running processors are external to 
the console. Communications paths between console processors are internal to the 
console. See Section 2.5.4. 


Commands are transmitted from a running primary to a console secondary; mes- 
sages (and requests) are transmitted from a console secondary to a running primary. 
Commands and messages are passed via Receive (RX) and Transmit (TX) buffers 





contained in each per-CPU slot of the HWRPB. The use of these buffers is controlled 
by the Receive Buffer Ready (RXRDY) and Transmit Buffer Ready (TXRDY) flags. 
Messages consist of the message symbol as given in Table 1-1. 


PROGRAMMING NOTE 
For example, “7PALREQ?” is passed to request PALcode 
loading. | 


Commands use the command syntax given in Section 1.3. 


The transmit and receive buffers are named from the point of view of the console 
secondary. The console secondary receives commands in the RX buffer and transmits 
messages in the TX buffer. 


2.4.1 Interprocessor Console Communications Flags 


The Receive Buffer Ready (RXRDY) and Transmit Buffer Ready (TXRDY) flags are 
used to control the interprocessor console communications. The RXRDY and TXRDY 
flags are gathered into bitmasks in the HWRPB at HWRPB[296] and HWRPB([304] 
respectively. The TXRDY bitmask allows a running primary to quickly determine 
which, if any, of the console secondaries are trying to send messages. 


The running primary sets the appropriate RXRDY flag to indicate to the receiving 
console secondary that a command is contained in the secondary’s RX buffer. The 
secondary is assumed to be polling its RXRDY flag. The RXRDY flag is cleared by 
the secondary after the command has been read from the RX buffer and prior to 
executing the command. 


A console secondary sets its TXRDY flag to indicate to the running primary that 
a message is contained in the secondary’s TX buffer. The console generates an 
interprocessor interrupt to the primary to notify it that a message is ready. System 
software clears the TXRDY flag after the message has been read from the TX buffer 
and prior to processing the message. 


IMPLEMENTATION NOTE 
The TXRDY bitmask minimizes interprocessor interrupt 
service overhead by reducing the number of required 
memory lookups. 
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2.4.2 Interprocessor Console Communications Buffer Area 


Each per-CPU slot of the HWRPB includes an RXTX Buffer Area which provides 
the communications path between processors. The buffer area is controlled by the 
RXRDY and TXRDY flags. The format is shown in Figure 2—7 and described in 
Table 2-11. 


Figure 2~7: Inter-Console Communications Buffer 


63 . 32 31 0 


TXLEN RXLEN | *SLOT+296 


Rx Buffer 
| 80(dec) Bytes | 
Tx Buffer | | 


[ | 80(dec) Bytes i 


:<SLOT+304 


‘SLOT+384 


:*SLOT+464 


Table 2—11: Inter-Console Communications Buffer Fields 
Offset Description 


SLOT+296 RXLEN - If the bit corresponding to this processor is set in the RXRDY bitmask 


at HWRPB[296], the RXLEN field contains the length in bytes of the command 


in the RX buffer. 

+300 TXLEN - If the bit corresponding to this processor is set in the TXRDY bitmask _ 
at HWRPB[304], the TXLEN field contains the length in bytes of the message © 
in the TX buffer. 

+304 RX BUFFER - Buffer used by this console secondary to receive a command from 


the running primary. Only command data is passed through this buffer; a con-— 
sole secondary does not receive messages from the running primary. Commands 
must end with “<CR><LF>” (OA0Djg¢). — 


+384 TX BUFFER - Buffer used by this console secondary to transmit a message 
to the running primary. Only message data is passed through this buffer; 
a console secondary does not send commands to the running primary. Mes- 
sages must end with with the console secondarys prompt, “<CR><LF>Pnn>>>" 
(ZE3E 3Enn nn50 0A0D4¢). 


2.4.3 Sending a Command to a Secondary 
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The running primary manipulates the secondary’s RXRDY ¥ flag and RX buffer in the 
following manner to send a command to a console secondary. In the sequence, the 
console secondary is assumed to have CPU ID = “N”. 





PROGRAMMING NOTE 
The RXRDY flag is a software lock variable; the primary 
and the secondary must use LDQ_L/STQ_C instructions 
to set and clear bit “N”. See Common Architecture, Chap- 
ter 5. 


The primary examines bit “N” of the RXRDY bitmask. If the bit is clear, proceed 
to step 3. 


The primary polls bit “N” of the RXRDY bitmask until clear or until some timeout 
is reached. If a timeout occurs, system software reports an error and takes 
appropriate action. © 7 


The primary moves the text of the desired console command into the RX buffer 


in the secondary’s HWRPB slot (the “Nth” per-CPU slot). 


The primary sets the length of the command into the RXLEN field in the sec- 
ondary’s HWRPB slot (the “Nth” per-CPU slot). 


The primary sets bit “N” of the RXRDY bitmask to indicate there is a command 
waiting. 


6. The secondary is assumed to be polling bit “N” of the RXRDY bitmask. 
7. When the secondary notices that bit “N” of the RXRDY bitmask is set, it removes 


9. 


the command from its RX buffer. 


The secondary clears bit “N” of the RXRDY bitmask, indicating that its RX buffer 
is again available. 


The secondary attempts to process the command. 


2.4.3.1 Sending a Message to the Primary 


The console secondary manipulates its TXRDY flag and TX buffer in the following 
manner to return a message to the running primary. Again, the console secondary 
is assumed to have CPU ID = “N”. | 


PROGRAMMING NOTE 
The TXRDY flag is a software lock variable; the primary 
and the secondary must use LDQ _L/STQ_C instructions 
to set and clear bit “N”. See Common Architecture, Chap- 
ter 5. 


The secondary examines bit “N” of the TXRDY bitmask. If the bit is clear, then 
proceed to step 3. 


The secondary polls this bit until it clears or until a long timeout occurs. (See 
step 7.) 


The secondary moves the text of its response message into the TX buffer in the 
secondary’s HWRPB slot (the “Nth” per-CPU slot). 
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9. 


10. 


11. 


The deseuaaty sets the length of ee message into the TXLEN field j in the sec- 
ondary’s HWRPB slot (the “Nth” per-CPU slot). 


The secondary sets bit “N” of the TXRDY bitmask to indicate there is a message 
waiting. 


The secondary issues an interprocessor interrupt to the primary. This is always 
done; the primary need not poll for bits in the TXRDY bitmask. 


The secondary polls the TXRDY bitmask until bit “N” clears or until a long 
timeout expires. This prevents the secondary from performing any action which . 
might cause the message to be lost before the primary can process it. 


PROGRAMMING NOTE 

The secondary may be restarted once it has trans- 
mitted the error halt message to the primary. How- 
ever, it must wait for the primary to have a rea- 
sonable chance to respond to the interprocessor in- 
terrupt and process the message before the restart 
proceeds since that message is important visible evi- 
dence of the error halt condition. On the other hand, 
the secondary shouldn't wait forever for the primary 
to respond since the primary may be affected by the 
same condition that caused the secondary to error 
halt. Hence, the need for a timeout that is of rea- 
sonable length. 


As a result of the interprocessor interrupt, the primary eventually checks for 
console messages by examining the TXRDY bitmask. The primary notices that 
bit “N” of the TXRDY bitmask is set. 


The primary removes the message from the TX buffer. 


The primary clears bit “N” of the TXRDY bitmask, indicating that the TX buffer 
is again available. 


The primary attempts to process the message. 


2.5 Implementation Considerations 


2.5.1 Serial Number and Revision Fields 
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The system serial number and revision fields must be distinct from the processor 


serial number and revision fields. In particular, on multiprocessing systems, the 
system fields must not be simply replicated from the fields of the primary processor. 
The system fields must be constant regardless of which processor serves as primary — 
and must have persistence across processor failures and/or replacement. 


This is necessary to permit application software to determine the system identity in 
a dependable fashion. An example of such application software is the system error 
log. if the system serial number were tied to a given processor, the error log would 
report a different serial number if that processor is later unavailable for any reason. 





2.5.2 Console Environment Variables 


While the HWRPB is the primary means of communication between the console and 
system software, there are cases for which it is ill-suited: 


e Because the HWRPB resides in main memory, it cannot preserve certain -critical 
components of the console state across powerfails. This state must be held by 
a mechanism that can survive powerfails. This state includes the necessary 
information to reboot system software after a powerfail. 


e The structure of the HWRPB is too rigid for the direct inclusion of variable-length 
console state which may be added after system software bootstraps. Support for 
variable-length state through the HWRPB would require the HWRPB to contain 
pointers to the actual state. The usage of memory reserved for this actual state 
would require negotiation between the console and system software. 


e There is a need for the console presentation layer to establish environment pa- 
rameters which affect the bootstrapping of system software. Many of these pa- 
rameters are set only once, but stay in effect across subsequent bootstraps and in 
some cases across the powering down and up of the system. This requirement is 
in effect today on both VAXes and DECsystems. The number, format, size, and 
legal values of these parameters are established by system software and may 
change from one revision to the next; they cannot be predicted by the console. 
Using the HWRPB to share these parameters between the console presentation _ 
layer and system software is awkward at best. 


The Alpha console solves the requirements of the above cases with one unified ap- 
proach: environment variables. 


The console environment variable routines must present a consistent interface to 
environment variables regardless of the presentation layer and regardless of the 
internal representation. For example, an ISO-LATIN—1 French console presentation 
layer could accept and display text for the BOOT_RESET environment variable as 
“marche” or “oui” for. 4E4F i, and “arrjt” or “non” for 46 464Fj, provided that the 
values of 4E4F 1, or 46 464Fj,. are returned to system software by GET_ENV. 


Console implementations are recommended to maintain a memory-resident copy of 
all non-volatile environment variables to ensure that the access time to these vari- 
ables remains within acceptable bounds. Examples of non-volatile storage media 
for environment variables include EEPROM, Flash ROM, and a console-private I/O 
device. 


A need to distinguish between environment variable values which are static across 
console initializations from those which are static across system bootstraps was nec- 
essary to support OpenVMS Alpha host-based shadow set bootstraps. This separa- 
tion permits the operating system to “temporarily” change the environment variables 
which govern bootstrapping. During shadow set state transitions, the operating sys- 
tem system disk must be a known valid shadow set member and that member or 
members cannot be determined until after the initial bootstrap process has com- 
pleted and the system initialization has begun. Temporarily altering the environ- 
ment variables enables the operating system to reorder the console bootstrap device 
list to ensure that the next, rapidly ensuing, bootstrap attempt will use a known» 
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valid siadow set member. Such an alteration is known to be transient and should 


not affect the normal bootstrap controls. 


Console implementation-specific or system software specific environment variable 
may be volatile or non-volatile. The nature of these environment variables is at the 
discretion of the console implementation. 


2.5.3 Console Callback Routines 
2.5.3.1 System Software Use of Console Callback Routines 


For those console callback routines intended for use while the operating system is 
fully functional, the console implementation must ensure that system software can 
invoke those routines at multiple IPLs and that the execution may be interrupted. 


The console implementation must ensure that internal console state is not corrupted | 


by conflicting requests by the console presentation layer and system software. 


Consider the case of an operating system debugger which gains control of the proces- 
sor during the execution of a console terminal routine invoked by system software 
executing at a lower IPL. The debugger must be able to access the console terminal. 
The console implementation may not block the higher IPL call. Note, however, that 
system software is recommended to serialize such accesses. If routine execution is 
resumed at the lower IPL, the console need not guarantee that the resumed oper- 
ation completes correctly. For example, if the routine requires access which is not 
atomic (for example indirect register access), the console implementation need not 
ensure that the resulting pattern of accesses do not result in an UNPREDICTABLE 
condition. 


Similarly, consider the case of a system software invocation of SET_ENV which is 
suspended by a processor halt. If the console presentation layer sets that same en- 
vironment variable and then continues the system, the new environment variable 
value must be either that specified by the console presentation layer OR that spec- 
ified by the system software. The value must not be corrupted even if there is no 
hardware guarantee of atomicity. 


2.5.3.2 Console Terminal Routines 
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The console terminal routines are intended to provide a consistent interface to the 
“console terminal device”, regardless of the physical realization of that “terminal de- 
vice”. The “console terminal device” may be a physical terminal directly connected 


to an embedded console by a UART or an graphic application ee on a work- © 


station which is networked to the processor console. 


The CTB solves the problem present in previous VAX systems where there are dif- 
ferently formatted data structures describing the same device across the various 
systems. This increases the burden of the operating system to support new systems. 
By requiring all Alpha implementations to use the same CTB format for the same 
device, the burden of supporting new systems by system software is lessened. 


A simple example of a multiple-channel controller is a DZ; multiple serial lines, each 


going to a different external device, share one set of CSRs and device characteristics. 
Another example of a multiple-channel controller is the DEFNA which supports 





> _ a 


multiple, possibly disjoint, Ethernets. A simple example of multiple single-channel 
controllers is multiple SGECs, each of which connects to a unique Ethernet. 


To simplify system software interrupt handling, it is recommended that implemen- © 
tations provide separate console terminal transmit and receive interrupts. 


2.5.3.2.1 PROCESS KEYCODE 


2.5.3.3 


2.5.3.4 


PROCESS_KEYCODE is intended for use by system software which must acquire 
keycodes directly from the console terminal device; PROCESS_KEYCODE translates 
the keycode into characters of the currently selected character-set. GETC is the 
normal method used to acquire characters from the console terminal; GETC performs 
any necessary translation. 


CTB information relevant to the translation includes the type of display-keyboard 
combination and the current keyboard state. In the process of translation, the rou- 
tine may buffer previously entered keycodes in the CTB. 


The supported display-keyboard combinations are specific to the console implemen- 
tation. Only those combinations which are supported by the console implementation 
are processed and translated by PROCESS_KEYCODE. 


Examples of severe keyboard errors which may be corrected include the LK401 key- 
codes: OUTPUT ERROR, INPUT ERROR, and TEST MODE ACKNOWLEDGE. 


Examples of keyboard state changes include shifting to uppercase keys, enabling 
CAPS LOCK and lighting the CAPS LOCK LED, and activating output flow control 
and lighting the HOLD SCREEN LED. 


This routine in intended to ease software effort to support graphics workstations in 
which the console presentation layer shares the workstation screen. 


Console Block Storage Routines 


These routines are provided for operating systems whose primary bootstrap is not 
large enough to carry the necessary I/O drivers to fetch the system image. This 
is particularly a problem for ULTRIX, where the primary bootstrap must fit into 
logical blocks 1 to 15 of the ULTRIX system disk. The console possesses most of the 
capabilities specified in these block storage routines due to boot device requirements. 
As such, permitting system software to make use of that functionality seemed both 
beneficial and simple to provide. 


FIXUP 


When considering how to make the console routines compliant to the Alpha Calling 
Standard and virtually relocatable, two choices quickly presented themselves. 


1. Provide the physical and virtual addresses of the procedure descriptor for each 
routine, ensure that the descriptor existed, and give the pages necessary to map 
for it. The resulting relocation would be quite piecemeal and, moreover, did not 
address the relocation of any necessary routine-private ponies (e.g. local I/O 
device registers.) 
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2. Provide a calling standard compliant interface, DISPATCH, give only the virtual 
and physical addresses for its procedure descriptor, and a (gather-scatter) list 
of physical pages to be mapped and relocated. The resulting relocation is less 
piecemeal and addresses the relocation of any necessary routine-private pointers. 


Note that both choices still require a virtual address FIXUP routine for relocation. 


The DISPATCH procedure and the console routines should be consolidated into a few | 
contiguous physical pages. Implementations should attempt to reduce the necessary 


CRB mapping entries. 


2.5.4 Interprocessor Console Communications 
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Considering the reasonable combinations of the four processor states, the following 
communications paths must be provided: 


1. Running processor to running processor. 


These paths are external to the console and independent of which is primary 
or secondary. They are supported by the communications mechanisms within 
the operating system. These paths are used even when the communications is 
related to the console. For example, an operating system debugger entered on 
a secondary is responsible for passing characters to and from the primary, and 
thus to the console terminal. 


2. Running primary to/from console secondary. 


The operating system on the primary must be able to send complete console 
commands to a console secondary, for example to start a secondary. A console 
secondary must be able to send messages to the operating system on the run- 
ning primary, for example when the secondary encounters an error halt. Such 
messages may be sent by a secondary at any time. 


It is not necessary for a secondary to send commands to the primary, or for the 
primary to send messages to a secondary. 


3. Console primary to/from running secondary. 


It is unclear what communication is necessary along this path. It is likely that 
whenever the primary halts, the secondaries will eventually block waiting for 
resources locked by the primary. The console primary will support receiving 
complete messages from a running secondary. 


NOTE 
All consoles include a mechanism to force a running 
primary into console I/O mode. Specific secondary 
processors may then be forced into console I/O mode 
using targeted HALT -CPU commands. 


4. Console primary to/from console secondary. 


The console primary must be able to send complete commands to a consele sec- 


—— ee oe ~- 


ondary. This allows the primary to update the copy of an implementation-specific 
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parameter stored in each processor. Such commands are generated internally by 
the console program. Also, commands entered at the console terminal which are 
intended for a secondary must be forwarded by the primary. 


Secondaries must be able to send complete messages to the primary. Such mes- 
sages arrive complete, the primary can easily avoid interleaving messages on the 
console terminal. : | | : 
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Revision 4.1, August 12, 1991 


is 


Replace previous Console Chapter with Console ECO #15 


2. Includes 3 chapters and two appendices, renumber I/O Chapter 


3. Material substantially changed or rearranged 





Chapter 3 
System Bootstrapping (IV) 





This chapter describes the net effects of the action of the console to control the system 
platform hardware. The major system state transitions and the role of the console 
in controlling those transitions is described in Section 3.1.1. When power is applied 
to an Alpha system, the console initializes the system as given in Section 3.2. The 
console actions necessary to bootstrap system software are described in Section 3.3. 
These steps include processor initialization (Section 3.3.1.6), memory sizing and test- 
ing (Section 3.3.1.1), building an initial virtual address space (Section 3.3.1.3), and 
loading the bootstrap (Section 3.5). The console actions to restart system software 
are described in Section 3.4. 


3.1 Processor States and Modes 
3.1.1 States and State Transistions 
An Alpha processor can be in one of five major states: 
Powered off - no system power supplied to the processor. 
Halted - operating system software execution suspended. 
Bootstrapping - attempting to load and start the operating system software. 
Restarting - attempting to restart the operating system software. 


oY oe oe NS 


Running - operating system software functioning. 


The transitions between the major states are determined by the current state and 
by a number of variables and events, including: 


¢ Whether power is available to the system. 

e The console AUTO_ACTION environment variable. 
¢ The console lock setting. 
© The Bootstrap-In-Progress (BIP) flags. 

¢ The Restart-Capable (RC) flags. 

¢ Processor error halts. 

¢ The CALL PAL HALT instruction. 

e Console commands. 


The following is a key for Figure 3-1: 
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Console is unlocked and AUTO_ACTION is “HALT” (544C 4148;¢).. 
B. Console is unlocked and AUTO_ACTION is “BOOT” (544F 4F 426). 


Console is unlocked and AUTO_ACTION i is “RESTART” (54 5241 = 45526) 
or console is locked. | 


) 


D Console is unlocked, the processor is forced into console I/O mode. 


Figure 3-1: Major State Transitions 


Action Causing Initial State | 
Transition to 


Final State Off Halted Booting Restart Running 
Powerfail Off Off 
A and Power Restored Halted 
B and Power Restored Booting 
C and Power Restored Restart 
BOOT and Console Is Locked Booting 
START or CONTINUE or Running 
Console Is Unlocked 
| Final 
Bootstrap Fails or D Halted State 


Bootstrap Succeeds Running 


D Halted 
Restart Fails Booting 
Restart Succeeds Running 


A and Processor Halts or D Halted 
B and Processor Halts Booting 
C and Processor Halts | Restart 
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Restricted 


To effect major state transitions, the console obeys these rules: 


e If the console is unlocked when power is restored or when the processor halts, 
enter the state selected by the console AUTO_ACTION environment variable. 


e If the console is locked when power is restored or when the processor halts, 
attempt a processor restart. | 


¢ When processor restart fails, attempt a bootstrap of that processor. One cause of 
a failed restart is the processor’s RC flag being clear when the console attempts 
the restart. 


¢ When system bootstrap fails, halt. One cause of a failed bootstrap is the proces- 
sor’s BIP flag being set prior to the console attempting the bootstrap. _ the 
processor that failed bootstrap will halt. 


¢ When system bootstrap or processor restart succeeds, the processor starts run- 
ning. 


e When the primary processor is halted and the console is unlocked, the console 
BOOT command causes a system bootstrap. 


e When a secondary processor is halted and the console is unlocked, the console 
START -CPU command causes the console to attempt to start that processor 
running. 


¢ When a processor is halted and the console is unlocked, the console CONTINUE 
command cause the processor to continue running as though no halt was in- 
curred. 


e If the console is unlocked and a specified processor is running or booting or 
restarting, that processor is halted by a console HALT -CPU command. 


IMPLEMENTATION NOTE 
In an embedded console implementation, the pri- 
mary processor must be forced into the console I/O 
mode prior to issuing the HALT -CPU command; see 
Section 3.7.3. 


3.1.2 Major Modes 


In addition to the major states, the console and processor are described as being in 
one of three modes: 


1. Program I/O mode 


The processor is running. The processor interprets instructions, services inter- 
rupts and exceptions, and initiates I/O operations under the control of the oper- 
ating system. 


2. Console I/O mode 


The processor is halted or bootstrapping or restarting. The console provides 
- control over the system; The operating system has either relinquished control 
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or has yet to gain control. The operating system does not service interrupts or 
exceptions or initiate I/O operations. The actions of the console are determined 
by internal console state and commands from the console operator. 


3. Console Initialization mode 


_ The console has yet to acquire control of the processor. The console itself may 
also require initialization, such as when power is first applied to the system. 


A given processor may be in one of four modes: 

1. Primary processor in program I/O mode or “running primary” 

2. Primary processor in console I/O mode or “console primary” 

3. Secondary processor in program I/O mode or “running secondary” 
4, Secondary processor in console I/O mode or “console secondary” 


As noted in Section 1.1, implementations must include a mechanism to force a pro- 
cessor executing in program I/O mode into console I/O mode. 


3.2 System Initialization 


An Alpha system must be initialized when power is restored. System initialization 
also occurs as the result of a system bootstrap when the BOOT_RESET environ- 
ment variable is set to “ON” (4E4Fj,), or as the result of the console INITIALIZE 
command. [Initialization involves all implementation-specific, system-wide actions 
necessary to give the system the ability to boot system software on the primary pro- 
cessor. Table 3—1 summarizes the effects of initialization as seen by system software. 


Initialization may include initialization of the console itself. During console initial- 
ization, the console must build the HWRPB and all associated data structures nec- 
essary to permit the console to accept console commands and boot system software. 


System initialization may also include any necessary system bus, processor, or I/O 
device initialization. The initialization of a processor performed as part of system 
initialization is not necessarily that performed just prior to transfer of control to 
the operating system bootstrap. See Section 3.3.1.6 for a description of processor 
initialization as seen by system software. 


Table 3-1: Effects of Power-Up Initialization 


Processor State Initialized State: 

BIP and RC flags Cleared 

Reason for halt code | 0’ (bootstrap) 

Integer and floating point registers UNPREDICTABLE 

System memory Unaffected if preserved by battery backup; oth- 


erwise, UNPREDI CTABLE 





Table 3—1 (Cont.): Effects of Power-Up Initialization 


Processor State Initialized State: 

Environment variables Unaffected if non-volatile otherwise, set to de- 
fault 

BB_WATCH Unaffected 

I/O device registers ~ UNPREDICTABLE 


3.3 System Bootstrapping 


This section describes the operations performed by the Alpha console to locate, load, 
and transfer control to a primary bootstrap. The responsibilities of the console and 
the initial state seen by system software are presented for multiprocessor and the 
uniprocessor environments. The actions of the console for cold bootstrap (full hard- 


ware initialization) and warm bootstrap (partial hardware initialization) are de- 
scribed. 


A system bootstrap can occur as the result of a powerfail recovery, a processor halt, 
or an INITIALIZE or BOOT console command. See Section 3.1.1 for a complete 
description of these state transitions. 


3.3.1 Cold Bootstrapping in a Uniprocessor Environment 


This section describes a cold bootstrap in a uniprocessor environment. A system 
bootstrap will be a cold bootstrap when any of the following occur: 


e Power is first applied to the system 


e Aconsole INITIALIZE command is issued and the AUTO_ACTION Environment 
variable is set to “BOOT” (544F 4F42).). 


e¢ The BOOT_RESET environment variable is set to “ON” (4E4Fj,). 

¢ Requested by system software. | 

The console must perform the following steps in the cold bootstrap sequence. 
Perform a system initialization 

Size memory 

Test sufficient memory for bootstrapping 

Load PALcode 

Build a valid Hardware Restart Parameter Block (HWRPB) 

Build a valid Memory Data Descriptor Table in the HWRPB 


Initialize bootstrap page tables and map initial regions 


Oe Se a ee a 


Locate and load the system software primary bootstrap image 
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9. Initialize processor state on all processors | | 
10. Transfer control to the system software primary bootstrap image 


The steps leading up to the transfer of control to system software may be per- | 
formed in any order. The final state seen by system software is defined, but the 
implementation-specific sequence of these steps is not. Prior to beginning a boot- 
strap, the console must clear any internally pended restarts to any processor. | 


Memory Sizing and Testing 


Memory sizing is the responsibility of the console. The console must also test suf- 
ficient memory to permit control to be passed to the primary bootstrap image. The 
results of console memory sizing and testing are passed to system software in the 
Memory Data Descriptor (MEMDSC) Table located by HWRPB/[200]. 


The MEMDSC Table contains one or more memory cluster descriptors. Each memory 
cluster descriptor describes a physically contiguous extent of physical memory within 
which there are no holes. Cluster descriptors are ordered by increasing physical 
address; the range of PFNs described by cluster N is of lower address than the 
range of PF'Ns described by cluster N+1. 


The MEMDSC Table must be quadword aligned and both physically and virtually 
contiguous. The MEMDSC Table format is shown in Figure 3—2; the memory cluster 
descriptor format is shown in Figure 3-3. The size of the MEMDSC Table can be 
determined by the number of clusters contained in MEMDSC[16]. The size of the 
table and the offset to the last quadword of the table are given by: 


MEMDSC_SIZE = ((7 * MEMDSC[1036]) + 3) * 8 
MEMDSC_END = MEMDSC SIZE -8 


The memory within a cluster is either available to system software or reserved | 
for console use. Usage within a cluster cannot be mixed; if the cluster contains a 
page reserved for console use, system software cannot allocate any page within the 
cluster. The memory cluster descriptor contains a cluster usage field which indicates 
the cluster availability to system software. Note that the primary bootstrap image 
must reside in clusters available to system software. | 


The memory within each cluster may be fully tested, partially tested, or untested 
by the console. If the memory is untested, no cluster memory bitmap is built. The 
console must test enough memory to allow the primary bootstrap image to be loaded 
and control to be passed to that image. This memory includes: 


e PALcode memory and scratch areas 
e CPU logout areas 

¢ Memory bitmaps 

e HWRPB and all offset blocks 

¢ Console CRB map entries 

e Bootstrap address space page tables 


e Primary bootstrap image 





e One page for the initial bootstrap stack 


Any additional memory testing by the console is implementation-specific. It is the 
responsibility of system software to test any memory untested by the console. 


- Acluster bitmap is built if the cluster is available to system software and the console 


tests any memory within the cluster. Each page in the cluster is represented by a 
bit in the bitmask. A ’l’ in the bitmap means that the corresponding page is “good”; 
the page was tested without error. A ’0’ in the bitmap means that the corresponding 
page is “bad”; the page is either untested or was tested but encountered correctable 
(Corrected Read Data) errors or hard (Read Data Substitute) errors. ¢ 


Cluster bitmaps must be at least quadword aligned and must be an integral number 
of quadwords; any unused bits in the highest addressed quadword MBZ. 


\See Section 3.7.1 for the rationale behind memory clusters, highwater marking, 
and marking Corrected Read Data errors as bad pages.\ 


Figure 3-2: Memory Cluster Descriptor Table 


63 







Memory Cluster Descriptor Last 


0 





-MEMDSC 
:+08 






“+16 





+24 








-MEMDSC_END 





Table 3-2: Memory Cluster Descriptor Table Fields 
Offset Description | 


MEMDSC CHECKSUM - Checksum which is the 64-bit, 2’s complement sum ignoring 


overflows of all the quadwords from MEMDSC+8 through MEMDSC_END. 
The checksum does not include any of the cluster bitmaps nor any optional 
implementation-specific data. 


+08 IMP_DATA_PA - Physical address of additional implementation-specific infor- 
mation (if any). If no additional implementation-specific information exists, the 
field must contain a zero. 


+16 CLUSTERS - Number of clusters in the Memory Cluster Descriptor Table. Un- 
signed integer. 
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Table 3-2 (Cont.): Memory Cluster Descriptor Table Fields 


Offset 


+24 


Description 


CLUSTER - Each Memory Cluster Descriptor describes an extent of physical 


memory. See Figure 3-3. | 


Figure 3-3: Memory Cluster Descriptor 
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Usage of Ciuster 


| 0 

-MEMC 
+08 
+16 
+24 





+32 
+40 
+48 
+56 


Table 3-3: Memory Cluster Descriptor Fields 


Offset 


MEMC 
+08 
+16 


+24 


+32 


+40 


+48 


Description 


PFN - Starting PFN of the memory cluster. 
PAGES - Number of pages in the memory cluster. Unsigned integer. 


TESTED_PAGES - Number of tested memory pages in the cluster. If only a limited 
extent of the cluster memory was tested, a bitmap is built, and this field indicates 
the number of pages that were tested. See Section 3.7.1. 


BITMAP_VA - Starting virtual address of the cluster memory testing bitmap in 
the bootstrap address space. If the memory is untested, no bitmap is built and 
this field is set to zero. | 


BITMAP_PA - Starting physical address of the cluster memory testing bitmap. If 
the memory is untested, no bitmap is built and this field is set to zero. 


BITMAP_CHECKSUM - Checksum which is the 64-bit, 2’s complement sum ignor- 
ing overflows of the cluster memory testing bitmap. Computed over the PAGES 
active bits only. | 


USAGE - Indicates whether the cluster is available for use by system software. If 
USAGE<0> is ’0’, system software may allocate and use the cluster. If USAGE<0> 
is ’l’, the cluster is reserved for console use and must not be allocated by system — 
software. USAGE<63:1> SBZ. 





3.3.1.2 PALcode Loading 


The console loads PALcode into good memory within a memory cluster which is 


- not available to system software. If PALcode scratch space is required, the console 


allocates good memory within a memory cluster which is not available to system 
software. PALcode memory and scratch space are at least page aligned. The console 
records the starting physical address and length of PALcode memory and scratch 
space and then sets the PALcode Memory Valid (PMV) flag in the per-CPU slot of 
the primary processor. The PMV flag indicates that the PALcode descriptors are 
valid. 


After PALcode loading and initialization, the console sets the PALcode Loaded (PL) 
and PALcode Valid (PV) flags in the primary’s per-CPU slot. The PL flag indicates 
that PALcode has been loaded; the PV flag indicates that any necessary PALcode 


initialization has been performed. 


3.3.1.3 


PALcode loading and initialization is implementation-specific. The PALcode source 
may be a special console device, ROM, a system device, a communications line, or 
any other implementation-specific source. The state of the console and system must 
be such that the source is accessible. The means by which any PALcode internal 
state is initialized is implementation-specific. 


Bootstrap Address Space 


\ See Section 3.7.5 for a guensee none of the structure of the initial bootstrap address 
space. \. | 


All system software, including the primary bootstrap image, runs in a araal mem- 
ory environment. The console creates the initial page tables which define the initial 
bootstrap address space for the primary bootstrap. System software may replace this 
bootstrap address space at any time after the console passes control to the primary 
bootstrap image. | 


The bootstrap address space consists of four regions. All regions must be located in 
good memory within clusters which are available to system software. The regions 
are: 


Region 0 

This region maps all console or PALcode data structures which must be shared with 
system software. These structures include the HWRPB in its entirety, all blocks 
located by HWRPB offsets, the console callback routines, and all memory bitmaps. 
Region 0 begins at address 256MB, virtual address 0000 0000 1000 le The 
starting address of the HWRPB is the base of Region 0. 


Region 1 

The primary bootstrap image is loaded into this region. The region must be at least 
large enough to load system software plus three pages. The three additional pages 
are used as an initial bootstrap stack and stack guard pages. The stack guard pages 
are virtually adjacent to the bootstrap stack page and marked no-access. All other 
pages in the region are mapped and valid. Region 1 begins at address 512MB, virtual 
address 0000 0000 2000 0000 ,.. 
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SOFTWARE NOTE 
This region must be set to the size of the primary boot- 
strap image plus 3 pages for OpenVMS — and at 
least 256K — for OSF/1 Alpha. 


Besione 2 


This region, or “page table space”, contains the bootstrap address space page tables. 
Region 2 begins at address 1GB, virtual address 0000 0000 4000 0000,,. The range 
is dependent on the page size: | 


Page Table Space 
Page Size Address Range 
8KB 1GB to 1GB+8MB° 
16KB 1GB to 1GB+16MB 
 32KB 1GB to 1GB+32MB 
64KB 1GB to 1GB+64MB 


This region includes the level 2 and level 3 page tables used to map all three regions 
comprising bootstrap address space. The level 2 page table maps itself as a level 3 
page table. The address of the level 2 page table page and the PTE within the page 
which is used for self-mapping are also dependent on the page size: 


Virtual Address of L2PTE Number Used 
Page Size Level 2 Page Table for Self-mapping 


8KB 1GB+1MB 128 
16KB 1GB+512KB 32 
32KB 1GB+256KB 8 
64KB 1GB+128KB 2 


Figure 3—5 illustrates the initial page tables that map the virtual address regions 
shown in Figure 3—4. 


Region 3 


This region maps the level 1 page table pages. The level 1 page table is self-mapped 
by the penultimate PTE in the page. Region 3 exists to support virtual page table 
lookup for Translation Buffer misses. Region 3 is not the primary page table space © 
that is presented to bootstrap software; system software must explicitly map the 
level 1 page tape page if aes 


PROGRAMMING NOTE 
Due to the self mapping, Region 3 maps all page table 
pages. The level 2 and level 3 page table pages are in 


both Region 2 and Region 3. 





Page Size Virtual Address of Level 1 Page Table 


SKB 2**64-8GB-8MB-16KB 
16KB » 2**64-64GB-32MB-32KB 
32KB 2**64—.5TB—128MB-—64KB 
64KB -2**64-4TB-.5GB-128KB 


Figure 3—4: Initial Virtual Memory Regions 


Region 0 


HWRPB Pages (Includes 
Memory Data Descriptor 
_ Table and CRB) 


Console Service 
Routines 


v Memory Bitmaps y 


Region 1 


:VA=1000 0000 (hex) 














Loaded System Software 


1 Page Stack 
: “SP 


:-VA=2000 0000 (hex) 


No-Access 


Region 2 
Unused -VA=4000 0000 (hex) 


ao eas 
t Level 3 Page Table Ff 


Map Region 0 
| Unused 

i Level 3 Page Table 1 
t Map Region 1 i 


Unused 
Level 2, 3 Page Table 
(Maps Itself and Region 2) 


All valid pages allow read/write access from kernel mode and deny all access from 
executive, supervisor and user modes. All fault bits (FOR, FOW, FOE) are clear, as 
' well as Address Space Match (ASM) and Granularity Hint (GH). 


The self-mapping of the level 2 page table excludes the level 1 page table page from 
Region 2. The level 1 page table has two active PTEs. The first LIPTE points to 
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the PFN of the level 2 page table page which maps page table space (Region 2). The 
penultimate L1PTE contains the PFN of the level 1 page table itself, thus defining 
Region 3. Only these two entries within the level 1 page table are valid; all other 
level 1 PTEs are zeroes. See Section 3.7.5. | 
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Figure 3-5: Initial Page Tables 


Level 1 PT 


PTBR: 





Last PTE 


Level 2 PT 








Level 3 PT 
First Maps VA=256 MB 
Region 0 


Page Table 


Maps VA=512 MB 


Region 1 Maps VA=1 GB 
T Page Table A 
Se ee nee 6 


The level 2 PT maps Region 2 (page table 
space) at 1 GB. The level 2 PT maps itself 
as its own level 3 PT. 





Level 3 PT 






The level 1 PT is not mapped. 


The self-mapping of the level 2 page table also causes the addresses of the level 2 and 
level 3 PTEs for a given virtual address to be functions of that address. For every 
virtual address within the bootstrap address space, there is exactly one location 
within page table space for the level 2 PTE that maps that virtual address, and 
exactly one location for the level 3 PTE that maps that virtual address. 


Thus, the level 2 and level 3 PTE virtual addresses for a given virtual address (VA) 
within bootstrap address space can be calculated given the page size. The following 
bit range definitions provide convenient notation for referring to the constituent 
parts of a virtual address. For example, “VA<L2>” is equivalent to “VA<32:23>” for 
SKB sized pages. | 


VA: 
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Page Size Ll L2 L3 

- 8KB 42:33 32:23 22:18 
16KB 46:36 35:25 24:14 
—82KB 50:39 38:27 26:15 © 
64KB 54:42 41:29 28:16 
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The base of page table space is a constant value: 


1. PT Base = : 1GB 


The virtual address of the level 3 PTE (L3PTE_VA) of any virtual address (VA) 
is given by: 


2. L3PTE _VA(VA) = PT Base + (page size * VA<L2>) + (8 * VA<L3>) 
Thus, the virtual address of the level 8 PTE which maps the lowest address of 
page table space is given by: 


L3PTE VA(PT Base) = PT Base + (page size * PT Base<L2>) 


Since the level 2 page table is self-mapped, the above is also the base virtual 
address of the level 2 page table. Thus: 


3. L2PT Base = PT Base + (page_size * PT Base<L2>) 


Finally, the virtual address of the level 2 PTE (L2PTE_VA) of any virtual address 
(VA) is given by: | 


L2PTE VA(VA) = L2PT Base + (8 * VA<L2>) 


4, L2PTE_VA(VA) = PT_ Base + (page_size * PT_Base<L2>) + (8 * VA<L2>) 
Bootstrap Flags 


The Bootstrap-In-Progress (BIP) and Restart-Capable (RC) processor state flags in 
the primary processor’s per-CPU slot are used to detect failed bootstraps. If the 
primary reenters console I/O mode while the BIP flag is set and the RC flag is clear, 
the bootstrap attempt fails, and the subsequent console action is determined by 
Figure 3-1. 


The console sets the BIP flag and clears the RC flag prior to transferring control to 
system software. System software sets the RC flag to indicate that sufficient context 
has been established to handle a restart attempt. System software clears the BIP 
flag to indicate that the bootstrap operation has been completed. The RC flag should 
be set prior to clearing the BIP flag. 





3.3.1.5 


Table 3—4: Console Interpretation of BIP and RC fiags 
BIP RC Interpretation at Entry to Console /O Mode 


set clear Failed bootstrap 
set set Halt condition encountered during bootstrap, restart processor 
clear clear Failed restart 


clear set Halt condition encountered, restart processor 


Loading of System Software 


The console is responsible for loading system software at the base of Region 1 begin- 
ning at virtual address 512MB. This software is expected to be a primary bootstrap 
program which is responsible for loading other system software, but may be diagnos- 
tic or other special purpose software. Section 3.5 contains descriptions of the format 
of each supported bootstrap medium. 


The console uses the BOOT_DEV environment variable to determine the bootstrap 
device and the path to that device. These environment variables contain lists of 
bootstrap devices and paths; each list element specifies the complete path to a given 
bootstrap device. If multiple elements are specified, the console attempts to load a 
bootstrap image from each in turn. , 


The console uses the BOOTDEF _ DEV, BOOT_DEV, and BOOTED_DEV environ- 
ment variables as follows: 


1. At console ‘iakeations: the console sets the BOOTDEF_DEV and BOOT DEV 
environment variables to be equivalent. The format of these environment vari- © 
ables is a function of the console implementation and independent of the console 
presentation layer; the value may be interpreted and modified by system soft- 
ware. See Appendix E for a list of current formats. 


2. When a bootstrap results from a BOOT command which specifies a bootstrap 
device list, the console uses the list specified with the command. The console 
modifies BOOT_DEV to contain the specified device list. NOTE: This may require 
conversion from the presentationlayer format to the registered format. 


3. When a bootstrap is the result of a BOOT command which does not specify a 
bootstrap device list, the console uses the bootstrap device list contained in the 
BOOTDEF_DEV environment variable. The console copies the value of BOOT- 
DEF_DEV to BOOT_DEV. 


4. When a bootstrap is not the result of a BOOT command, the console uses the 
bootstrap device list contained in the BOOT_DEV environment variable. The 
console does not modify the contents of BOOT_DEV. 


5. The console attempts to load a bootstrap image from each element of the boot- 
strap device list. If the list is exhausted prior to successfully transferring control 
to system software, the bootstrap oe fails and the subsequent console action 
is determined by Figure 3-1. 
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6. The console indicates the actual bootstrap path and device used in the BOOTED_ 
DEV environment variable. The console sets BOOTED_DEV after loading the 
primary bootstrap image and prior to transferring control to system software. 
The BOOTED_DEV format follows that of a BOOT_DEV list element. 


7. Ifthe bootstrap device list is empty, BOOTDEF_DEV or BOOT_DEV are NULL : 


001.6, the action is implementation-specific. The console may remain in console 
I/O mode or attempt to locate a bootstrap device in an implementation-specific 
manner. 


The BOOT_FILE and BOOT_OSFLAGS environment variables are used as default 
values for the bootstrap filename and option flags. The console indicates the ac- 
tual bootstrap image filename (if any) and option flags for the current bootstrap at- 
tempt in the BOOTED_FILE and BOOTED_OSFLAGS and environment variables. 
The BOOT_FILE default bootstrap image filename is used whenever the bootstrap 
requires a filename and either none was specified on the BOOT command or the 
bootstrap was initiated by the console as the result of a major state transition. The 
console never interprets the bootstrap option flags, but simply passes them between 
the console presentation layer and system software. 


Processor Initialization 


Before control is transferred to system software, certain IPRs and other processor 
state must be initialized as shown in Table 3—5. Processor initialization is performed 
by the console prior to booting a processor, prior to restarting a processor, or as the 
result of the INITIALIZE -CPU console command. 


The Context Valid (CV) flag in the processor’s per-CPU slot must be valid for pro- 
cessor initialization to be successful. If the CV flag is clear, the HWPCB contained 
in the per-CPU slot is not valid, and the console must not transfer control to system 
software. In the event of this or any error initializing the processor, the console re- 
tains control of the system and generates the binary error message ERROR_PROC_ 
INIT. | 


Table 3-5: Processor Initialization 


Processor State Initialized State 

ASN Address Space Number Zero 

ASTEN AST Enable ASTEN in processor’s HWPCB 
ASTSR AST Summary | ASTSR in processor’s HWPCB 
FEN Floating Enable FEN in processor’s HWPCB 
IPL Interrupt Priority Level 31 

MCES Machine Check Error Summary Zero . 

PCBB _s—Privileged Context Block Address of processor’s HWPCB 
PS Processor Status IPL=31, VMM=0, CM=K, SW=0 
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Table 3—5 (Cont.): Processor Initialization 


Processor State 


PTBR Page Table Base Register 
SISR Software Interrupt Summary 


Initialized State 


PFN value in processor’s HWPCB 


Zero 


WHAMI Who-Am-I CPU identifier 

SCC System Cycle Counter Zero 

SP Kernel Stack Pointer KSP in processor’s HWPCB 
Other IPRs UNPREDICTABLE 
Cache, instruction buffer, or write buffer empty or valid 
Translation buffer Invalidated 

Main memory Unaffected 

Integer and floating point registers Unaffected, except SP 
Reason for Halt code Unaffected 

BIP and RC flags Unaffected 
Environment variables Unaffected 


3.3.1.7 Transfer of Control to System Software 


Prior to transferring control to system software, the console must define valid hard- 
ware privileged context for that software. The console builds that context in the 
hardware privileged context block (HWPCB) in the primary processor’s per-CPU 
slot. The initialize context is summarized in Table 3-6. 


The initial KSP points to the lowest addressed quadword in the higher addressed 
stack guard page (top-of-stack) of Region 1 of the bootstrap address space. The 
PTBR points to the level 1 page table page. All other scalar and floating point 
register contents are UNPREDICTABLE. 


After building HWPCB for the primary, the console sets the Context Valid (CV) flag 
in the primary’s per-CPU slot. All other bootstrap information is passed from the 
console to system software via environment variables. See Section 2.2 for more 
details. 


Table 3-6: Initial HWPCB contents 


HWPCB Field Initialized State 

KSP Top-of-stack (contents of SP) 
ESP UNPREDICTABLE 

SSP UNPREDICTABLE 

USP | UNPREDICTABLE 
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Table 3-6 (Cont.): Initial HWPCB contents 


aller aa 
-PTBR _ PFN of level 1 page table 

ASN ‘Zero | 

ASTSR Zero 

ASTEN Zero (all disabled) 

FEN | Zero (disabled) 

PCC Zero > | 

Unique Zero 

Value | 

PAL scratch — Implementation-specific 


Control is transferred to system software in kernel mode at IPL 31 with virtual 
memory management enabled. Control is transferred to the first longword of the 
system software image loaded into Region 1, virtual address 0000 0000 2000 0000¢. 
Prior to transferring control, the console ensures that the SP contains the KSP value 
in the HWPCB. System software should assume that the stack is initially empty. 


The transfer of control transitions the primary processor from the halted state into 
the running state and from console I/O mode into program I/O mode. The rest of the 
uniprocessor bootstrap process is the responsibility of system software. 


3.3.2 Warm Bootstrapping in a Uniprocessor Environment 
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The actions of the console on a warm bootstrap are a subset of those for a cold 
bootstrap. A system bootstrap will be a warm bootstrap whenever the BOOT_RESET 
environment variable is set to “OFF” (46 4E4F,,) and console internal state permits. 


The console performs the following steps in the warm bootstrap sequence. 
1. Locate and validate the Hardware Restart Parameter Block (HWRPB) 
2. Locate and load the system software primary bootstrap image 

3. Initialize processor state on all processors | 

4. Initialize bootstrap page tables and map initial regions 

5. Transfer control to the system software primary bootstrap image 


At warm bootstrap, the console does not load PALcode, does not modify the Memory 
Data Descriptor Table, and does not reinitialize any environment variables. If the 
console cannot locate and validate the previously initialized HWRPB, the console 
must initiate a cold bootstrap. Prior to beginning a bootstrap, the console must 
clear any internally pended restarts to any processor. | 





PROGRAMMING NOTE 
Warm bootstrap permits system software to preserve 
limited context across bootstraps. See Sections 2.5.2 and 
3.7.1. | | 


3.3.2.1 HWRPB Location and Validation 


After console initialization, the console must preserve the location of the HWRPB in 
an implementation-specific manner. On warm bootstraps and restarts, the console 
locates the HWRPB and verifies it by ensuring that: 


1. The first quadword of the table contains the physical address of the table. 
2. The second quadword of the table contains “HWRPB” 0000 0042 5052 5748,.. 


3. The quadword at offset HWRPB[288] contains the 64-bit, 2’s complement sum 
ignoring overflows of the quadwords from offset HWRPB[00] to HWRPB[280], 
inclusive, relative to the beginning of the potential HWRPB. 


4. The quadword at offset [0] of the MEMDSC block contains the 64-bit, 2’s com- 
plement sum ignoring overflows of the quadwords from MEMDSC+8 through 
MEMDSC_END of that block. The MEMDSC block is located by the MEMDSC 
OFFSET at HWRPB[200]. See Figure 3-2. 


5. As described in Section 2.1.4, if a CONFIG table exists, it is located by the 
CONFIG OFFSET at HWRPB[208]. The quadword at offset [8] of the optional 
CONFIG table contains the 64-bit, 2’s complement sum ignoring overflows of the 
quadwords from CONFIG+8 through CONFIG_END of that table. 


If any of the above conditions are not true, the HWRPB is not valid. The warm boot- 
strap (or restart) fails. The subsequent console action is determined by Figure 3-1. 
If a bootstrap is indicated, a cold bootstrap will be performed. 


\ The console must not search memory. for a HWRPB; searching memory constitutes 
a security hole.\ | 


3.3.3 Multiprocessor Bootstrapping 


Multiprocessor bootstrapping differs from uniprocessor bootstrapping primarily in 
areas relating to synchronization between processors. In a shared memory system, 
processors cannot independently load and start system software; bootstrapping is 
controlled by the primary processor. 


3.3.3.1 Selection of Primary Processor 


The primary processor is selected by the console during system initialization prior 
to any access to main memory by any processor. Selection of the primary processor 
may be done in any fashion that guarantees choosing exactly one primary processor. 


Once a primary processor has been selected, the secondary processors take no further 
action until appropriately notified by the primary processor. In particular, secondary 
processors must not access main memory. 


See Section 3.7.3 for considerations for embedded console implementations. 
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3.3.3.2 Actions of Console 
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3.3.3.3 
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After selection, the console Seeeseile to bootstrap the primary processor the : 


normal uniprocessor bootstrap as described in Section 3.3.1. 


The console must correctly initialize all HWRPB fields used for sonehnon aon or 
communication between the processors. The console must initialize the PRIMARY 
CPU ID field at HWRPB[382], zero the TXRDY and RXRDY bitmasks at HWRPB[296) 


and HWRPB[304], and recompute the HWRPB checksum at HWRPB[288]. | 


The console must also initialize each per-CPU slot for the secondary processors. The 
console must: 


1. Clear the BIP, RC, OH, and CV flags. 

2. Clear the Halt Request code field. 

3. Set the PP flag if the processor is present. 
4 


Set the PA flag if the processor is present and available for use by system soft- 


ware. 
Set the PMV and PL flags if the console has loaded PALcode on this processor. 
6. Set the PV flag if the console has initialized PALcode on this processor. 

Set the PE processor variation flag if the processor is eligible to become a primary. 


After initializing each processor’s per-CPU slot, the console must notify each con- 
sole secondary processor of the existence and location of the valid HWRPB. See 
Section 3.7.3 for considerations for embedded console implementations. 


PALcode Loading on Secondary Processors 


Most console implementations load PALcode on all secondary processors prior to 
bootstrapping the primary processor. Console implementations may delay the load- 
ing or initialization of PALcode on a secondary. If delayed, PALcode loading and 
initialization requires the cooperation of system software executing on the running 
primary and the console executing on behalf of the secondary. 


The console secondary must have performed any necessary initialization as described 
in Section 3.3.3.5. All interprocessor console communications follow the mechanisms 
described in Section 2.4. The operation proceeds as follows: 


1. The console secondary initializes the PALcode memory and scratch space length 
fields in its per-CPU slot. 


2. The console secondary sets the PALcode major revision, minor revision, and com- 
patibility subfields in the PALcode revision field in its per-CPU slot. 


3. The console secondary notifies the primary that PALcode loading is requested 
by transmitting a 7PALREQ? message to the running primary as described ; in 
Section 2.4. 


4. The console secondary polls the PALcode Memory Valid ( PMV) flag in its per-CPU 
slot. 
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10. 


11. 


12. 


13. 


14. 


15. 


16. 


17. 


The running primary detects the console secondary request. 


The running primary verifies that the Processor Available (PA) flag is set in the 
secondary’s per-CPU slot. If not, the operation fails. 


The running primary compares the major and minor revision sub-fields of the 
PALcode revision field in its per-CPU slot to that in the secondary’s per-CPU 
slot. If the revisions levels do not match, the running primary proceeds to step 
12. 


The running primary compares the number of processors currently sharing its 
PALcode image to the maximum contained in the sub-field of the PALcode re- 
vision field of its per-CPU slot. If the current number is the maximum, no ad- 
ditional console secondary can share the PALcode image. The running primary 
proceeds to step 12. 


PROGRAMMING NOTE 
The running primary can determine the number of 
processors currently sharing a given PALcode im- 
age by counting the number of per-CPU slots with 
the same valid PALcode memory space descriptors. 
A PALcode memory space descriptor is valid if the 
PALcode Loaded (PL) flag is set in the per-CPU slot. 


The running primary copies the PALcode memory and scratch space descriptors 
from its per-CPU slot into the secondary’s per-CPU slot. 


The running primary copies the PALcode variation, compatibility, and maximum 
number of processors sub-fields of the PALcode revision field from its per-CPU 
slot into the secondary’s per-CPU slot. 


The running primary sets the PALcode Loaded (PL) flag in the secondary’s per- 
CPU slot, then proceeds to step 13. 


The running primary allocates physical memory for PALcode memory and scratch 
areas and records the addresses in the secondary’s per-CPU slot. 


The running primary sets the PALcode Memory Valid (PMV) flag in the sec- 
ondary’s per-CPU slot. 


The console secondary Siesises the PMV flag is set in its per-CPU slot. 


If the PL flag in its per-CPU slot is not set, the console secondary loads PALcode 
into the allocated PALcode memory and scratch space. In this case, the console 


secondary sets the PALcode Loaded (PL) flag in its per-CPU slot. 


The console secondary ensures that any required implementation-specific PAL- 
code initialization is performed. 


The console secondary sets the PALcode Valid (PV) flag in the secondary’s per- 
CPU slot. 
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The PALcode memory and scratch space must be page aligned. If not allocated by the 


console prior to system bootstrap, the allocation management of PALcode memory 


for secondary processors is the responsibility of system software. 


Note that it is the responsibility of system software to ensure that the PALcode 
revision levels of all processors are compatible. This may be performed by the pri- 
mary prior to starting the secondary, by the starting secondary, or any combination 


thereof. PALcode images of different revision levels are compatible if the PALcode | 


revision compatibility subfields match. 


Actions of the Running Primary 


System software executing on the primary processor must initialize the HWPCB for 
each secondary processor. The HWPCB contains the necessary privileged context 
for the execution of system software and successful restarts. The HWPCB must 
be initialized prior to requesting that the console secondary perform any START 
command. After initializing the HWPCB, system software sets the Context Valid 
(CV) flag. 


Once the PALcode is valid on a console secondary, the secondary waits for a START 
(or other) command from the running primary. System software issues the necessary 
console commands which instruct the secondary to begin executing software. The 
exchange of commands and messages between the running primary and a secondary 
is described in Section 2.4. 


PROGRAMMING NOTE 
Note that all commands sent to a console secondary are 
implicitly targeted to the secondary. No -CPU command 
qualifier is necessary. 


Actions of a Console Secondary 


After failing to become the primary, a console secondary uses an implementation- 
specific mechanism to determine when a valid HWRPB has been constructed in main 


memory. The console secondary then locates the HWRPB in an implementation- | 


specific manner. 


Once the HWRPB is located, the secondary locates its per-CPU slot using its CPU 
ID as an index. The secondary verifies that its slot exists by comparing its CPU ID 
to the number of per-CPU slots at HWRPB[144]. If its CPU ID exceeds the number 
of per-CPU slots, the secondary must not leave console mode or continue to access 
main memory. If PALcode loading is necessary, the console secondary follows the 
procedure given in Section 3.3.3.3. 


Once PALcode is valid, the console secondary waits for a START (or other) command 
from the running primary by polling the appropriate flag in the RXRDY bitmask. 
The exchange of commands and messages between the running primary and a sec- 
ondary is described in Section 2.4. 7 


In response to a START command, the console secondary: 


1. Verifies that the Context Valid (CV) flag is set in its per-CPU slot. 
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Sets the Bootstrap-In-Progress (BIP) flag in its per-CPU slot. 

Clears the Restart-Capable (RC) flag in its per-CPU slot. 

Initializes the processor. 

Loads the privileged context specified by the HWPCB in its per-CPU slot. 
Loads the procedure value at HWRPB[264] into R27. 

Clears R26 (return address) and R25 (argument information). 


Loads the virtual address page table base (VPTB) register with the value stored 
in HWRPB[272]. 


9. Transfers control to the CPU Restart routine, whose virtual address is stored in 
HWRPB[256]. 


The CV flag indicates that the HWPCB in the slot contains valid hardware privileged 
state for system software. If the CV flag is not set, the processor remains in console 
VO mode. ‘ 
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3.3.3.6 Bootstrap Flags 


The Bootstrap-In-Progress (BIP) and Restart-Capable (RC) processor state flags in 
the console secondary processor’s per-CPU slot are used to control error recovery 
during secondary starts. If the secondary reenters console I/O mode while the BIP 
flag is set and the RC flag is clear, the start attempt fails. Failed starts are equiva- 
lent to failed bootstraps, and the subsequent console action is determined by Table 
Figure 3—1. See Section 3.3.1.4 and Table 3-4. 


3.3.4 Addition of a Processor to a Running System 


A processor may be added to a running system at any time if a slot has been provided 
for it in the HWRPB. The new console secondary processor follows the secondary 
start procedure given in Sections 3.3.3.3 and 3.3.3.5 with one minor difference. If no 
PALcode loading is necessary, the console secondary sends a ?STARTREQ? message 
to the running primary. This message notifies the primary that a new processor 
has been added to the configuration. After sending the ?STARTREQ? message, the 
console secondary waits for a START (or other) command from the running primary. 
See Section 2.4 for a description of interprocessor console communication. 


3.3.5 System Software Requested Bootstraps 


System software can request that the console perform a system bootstrap. This 
request can be made on any processor in a multiprocessor system and overrides the 
setting of the AUTO_ACTION and BOOT_RESET environment variables. 


To request a bootstrap, system software sets one of the bootstrap requested codes 
in the Halt Request field of its per-CPU slot then executes a CALL_PAL HALT 
instruction. If a cold bootstrap is requested, the “Cold Bootstrap Requested” code (’2’) 
is set; the “Warm Bootstrap Requested” (’3’) code is set to request a warm bootstrap. 


Rather than the normal error halt processing described in Section 3.4.4, the console 
initiates the appropriate system bootstrap as described in Sections 3.3.1 and 3.3.2. 
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The bootstrap attempt is unconditional; the AUTO_ACTION or the BOOT_RESET 
environment variables do not affect the bootstrap attempt. 


3.4 System Restarts 


The console is responsible for restarting a processor halted by powerfail or by error 
halt. The console follows the same sequence for a primary or secondary processor. 


3.4.1 Actions of Console 
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The console begins the restart sequence by locating and then validating the HWRPB 
following the procedure given in Section 3.3.2.1. If the HWRPB is not valid, the 


restart attempt fails. See Section 3.1.1 for console actions at major state transitions. 


If the HWRPB is valid, the console uses the processor CPU ID as an index to calculate 
the address of that processor’s HWRPB slot. The console: 


1. Verifies that the processor’s PALcode Valid (PV) flag is set. If the PV flag is clear, 
PALcode is not valid, and restart attempt fails. 


2. Verifies that the processor’s Context Valid (CV) flag is set. If the CV flag is 
clear, the HWPCB does not contain valid software context for the restart, and 
the restart attempt fails. 


3. Examines the processor’s restart-capable (RC) flag. If set, the console proceeds 
with the restart at step 5. If clear, system software is not capable of attempting 
the restart, the restart attempt fails. 


4, Examines the Bootstrap-In-Progress (BIP) flag. If clear, and the AUTO_ACTION 
environment variable is “BOOT” (544F 4F42,,), a system bootstrap is attempted. 
Otherwise, the processor remains in console I/O mode. See Figure 3-1. 


Loads the privileged context specified by the HWPCB in its per-CPU slot. 
Loads the procedure value at HWRPB[264] into R27. 
Clears R26 (return address) and R25 (argument information). 


Loads the virtual address page table base (VPTB) register with the value stored 
in HWRPB[272]. 


9. Transfers control to the CPU Restart routine, whose virtual address i is stored 3 in 
HWRPB[256]. 


On all restart attempt failures the console initiates the action indicated by Fig- 
ure 3-1. Note that the PV and CV flags should never be clear for the primary 
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processor; if either flag is clear, then the restart fails. Also note that no PALcode or — 


system software is loaded during a restart. 


It is the responsibility of system software to complete the restart operation and to 
set the RC flag at the point where a subsequent restart can be handled correctly. 
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3.4.2 Powerfail and Recovery - Uniprocessor 


An Alpha system requires power to operate. The system power supply conditions 
external power and transforms it for use by the processor, memory, and I/O subsys- 
tems. Backup options are available on some systems to supply power after external 
power fails. The backup option may supply power to all of the system platform 
hardware, or only a subset. 


The effect of an external power failure depends on the backup option. 


1. If no backup option exists, the processor is not restartable after restoration of 
power. The processor must be bootstrapped or left halted in console I/O mode. 


2. If the backup option maintains power to all of the system platform hardware, 
execution of system software is unaffected by the power failure. It must be 
possible for system software to determine that a transition to backup power has 
occurred. 


3. If the backup option maintains only the contents of memory and keeps system 
time with the BB_WATCH, the power supply must request a powerfail interrupt. 
After requesting the interrupt, the power supply must continue to supply power 
to the processor for an implementation-specific period to allow system software 
to save state. 


In the last case, powerfail recovery is possible only if adequate system state is pre- 
served during an interruption of power to the processor. As explained in OpenVMS 
Section, Chapter 6, a powerfail interrupt is delivered at IPL 30 to the interrupt 
service routine located at SCB offset 640,,. System software must save all volatile 
state and perform any operating system specific actions necessary to ensure later 
successful recovery. 


When power is restored, the console determines that the HWRPB is still valid, then 
examines the console lock and AUTO_ACTION environment variable. If the console 
is locked, and AUTO_ACTION environment variable is “RESTART” (54 5241 5453 45524), 
the console attempts an operating system restart. See Section 3.1.1. 


Note that the processor may lose state when power is lost. For example, if a processor 
is halted when power fails, the action on power up is still determined by the console 
switches and environment variables. The system does not necessarily stay halted. 
3.4.3 Powerfail and Recovery - Multiprocessor 
There are two basic approaches to powerfail recovery on multiprocessor systems: 


e¢ United - all available processors effectively experience the powerfail event iden- 
tically. 


e Split - each available processor effectively experience iadecandent powerfail 
events. | 


A processor is “available” if the Processor Available (PA) flag is set in the processor’s 
per-CPU slot. The Powerfail system variation nee at HWRPB[88] indicates the type 
of powerfail and restart action. 
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3.4.3.2 


A multiprocessor Alpha system that supports powerfail recovery must implement 
the united powerfail mode. The split mode may be optionally implemented asa 
alternative selected at system bootstrap. 


SOFTWARE NOTE | 
OpenVMS Alpha supports only the united powerfail and 
recovery mode at this time. Powerfail recovery is possi- 
ble only when the primary is restarted; all secondaries 
should remain in console I/O mode. 


United Powerfail and Recovery 


‘In united powerfail and recovery mode, all available processors experience powerfail 
interrupts, halts, and restorations uniformly. If one available processor experiences — 


a powerfail event, all other available processors experience that event. Therefore, if 
one processor powerfails and recovers, all processors must do so. Even if a separately 
powered processor does not actually lose power, that processor will still receive the 
powerfail interrupt and must be restarted as if power had been lost. 


When power is restored and a restart is to be attempted, the console must determine 
whether to restart all available processors or only the primary processor. The console 
determines the appropriate action by the Powerfail Restart (PR) flag in the system 
variation field of the HWRPB[88]. If the PR flag is set, the console attempts to restart 
all available processors; if clear, the console attempts to restart only the primary 
processor. In both cases, it is the responsibility of system software to coordinate and 
synchronize further powerfail recovery. 


Split Powerfail and Recovery 


In split powerfail and recovery mode, only the available processors that actually 
experience a loss of power will see a powerfail interrupt and subsequent recovery. 
Available processors that are separately powered and do not lose power do not see 
a powerfail interrupt. 


When power is restored and a restart is to be attempted, the console must determine 
whether to restart any available processor or only the primary processor. As in 
the united mode, the console determines the appropriate action by the Powerfail 
Restart (PR) flag in the system variation field of the HWRPB[88]. If the PR flag 
is set, the console attempts to restart any available processor. If clear, the console 
attempts to restart only the primary processor; on a secondary, the console sends 
the ?STARTREQ? message and waits for a START (or other command) from the 
running primary as discussed in Section 3.3.3.5. Again, system software has the 
responsibility for further coordination and synchronization of powerfail recovery. 


3.4.4 Error Halt and Recovery 
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There are a number of serious error conditions that prevent a processor from execut- 
ing the current thread of software. Such error conditions are detected by PALcode 
and lead to the processor being halted. 


Distribut 
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The console must ensure that the processor hardware state when a halt is encoun- 
tered is visible to system software after a subsequent restart attempt and to the con- 
sole operator. This state includes the current values in PS, PC, SP, PCBB, HWPCB, 
all integer registers, all floating point registers, and the name of the halt condition. 
The console must: | 


1. Ensure that the contents of the integer and floating point registers appear unaf- 
fected. 


2. Write the current hardware context to the HWPCB located by the current PCBB. 
3. Write the current PS, PC, PCBB register contents into the processor’s per-CPU 


slot. 

4. Write the current R25, R26, and R27 register contents into the processor’s per- 
CPU slot. 

5. Set appropriate code into the Reason For Halt field of the processor’s per-CPU 
slot. 


Note that the values of R25, R26, and R27 must be explicitly saved in the per-CPU 
slot to permit the console to follow the Alpha calling standard when invoking the 
CPU Restart routine. 


Section 3.1.1 and Table 2—3 list the defined halt conditions that transition an Alpha 
processor from the running state to a halted state, and which may lead to an attempt 
to restart the processor. Each condition is passed to the operating system in the 
Reason For Halt quadword of the processor’s HWRPB slot. 


When an error halt occurs, the console examines the console lock setting. If the con- 
sole is locked, the console attempts a restart. If unlocked, the console action is deter- 
mined by the setting of the AUTO_ACTION environment variable, see Figure 3-1. 
See Section 3.4.1 for a description of the restart attempt process. 


The processor must be initialized after an error halt. If the processor starts running 
after an error halt without an intervening processor initialization, the operation of 
the processor is UNDEFINED. The effects of processor initialization are summarized 
in Table 3-5. 


An error halt directly affects only the processor that incurred one, although multiple 
processors may simultaneously and coincidentally incur their own error halt condi- 
tions. If restarts are enabled, each halted processor must be independently restarted 
by the console. The restarts of individual processors may occur in a different order 
than the error halts occurred, but if the console restarts any halted processor, it must 
restart all halted processors in a timely fashion unless a bootstrap is requested in 
the meantime. A bootstrap nullifies any pending restarts in the multiprocessor. 


3.4.5 Operator Requested Crash 


When the operating system does not respond to normal program requests, the console 
operator may request that the console request an operating system crash. A console 
requested crash differs from a console halt of a processor in that system software 
can write a crash dump. 
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The console operator interacts with the console presentation layer and requests the 
crash with a HALT -CRASH command. The console converts this command to an er- . 


_ ror halt restart of system software. After gaining control of the processor, the console 


preserves the hardware state; see Section 3.4.4. The console passes the crash request 
to system software by using the “Console Operator requests system crash” code in 
the Reason For Halt field in the primary’s per-CPU slot. It is the responsibility of the 
system software restart routine to initiate the crash in an implementation-specific 
fashion. 


3.4.6 Primary Switching 
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System software may find it necessary to fenlaee the primary processor with one of 
the running secondary processors without bootstrapping the system. This “switch’ 
of the running primary may be caused by an error encountered by the primary, or 
by a program request. Switching a running primary must be initiated by system 
software; the console cannot force a switch to occur. 


Support for primary switching is optional to system software, console implementa- 
tions, and system platforms. The system platform hardware must permit the se- 
lected secondary to assume the functions of a primary. The selected secondary must 
have direct access to the console, a BB_WATCH, and all I/O devices. Direct access to 
the console ensures that the secondary can access console I/O devices and the console 
terminal. Direct access to a BB_WATCH ensures that the secondary can act as the 
system timekeeper. Direct access to all I/O devices ensures that the secondary can 
initiate I/O requests to and receive I/O interrupts from all I/O devices, and that the 
secondary can reinitialize all devices as part of powerfail recovery. 


If the processor is eligible to become a primary, the console will set the Primary 
Eligible (PE) processor variation flag in the processor’s per-CPU slot during processor 
initialization. | 

Primary switching requires cooperation between system software and the console. 
System software is responsible for the selection of the new primary and any nec- 
essary redirection of I/O interrupts. The console is responsible for any necessary 
configuration of the console terminal or other console device interface. 


The sequence of events differs depending on the type of console implementation. On 


a system with an embedded console, the operation proceeds as follows: 


1. System software performs any actions specific to system software synchroniza- 
tion. 


2. System software executing on the old primary ensures that the console terminal 
is in a quiescent state. In particular, character reception from the terminal must 
be suspended. 


_ 8. System software selects the new primary. The selected secondary must be eligible 


as indicated by the PE processor variation flag in its per-CPU slot. 


4, System software executing on the old primary invokes the PSWITCH console 
callback specifying the “transition from primary” action. 
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_ 5. The console attempts to perform any necessary hardware state changes to trans- 
form the old primary into a secondary. 


HARDWARE/SOFTWARE COORDINATION NOTE 
An example of such a hardware state change is dis- 
abling a console UART physically located on the pro- 
cessor board. 


6. Ifthe state change is completed, PSWITCH returns success status. System soft- 
ware may proceed with the primary switch at step 8. 


7. If the state change is not effected, PSWITCH returns failure status. System 
software must take other appropriate action. 


8. System software executing on the old primary notifies system software on the 
selected secondary of the successful PSWITCH completion. 


9. System software executing on the selected secondary invokes the PSWITCH con- 
sole callback specifying the “transition to primary” action. 


10. The console verifies that the selected secondary is eligible to become a primary 
and attempts to perform any necessary hardware state changes to transform 
the old secondary into the new primary.\An example of such a hardware state 
change is draining the character FIFO and enabling a console UART physically 
located on the processor board.\ 


11. If the state change is completed, PSWITCH returns success status. System soft- 
ware may proceed with the primary switch at step 13. 


12. If the state change is not effected, PSWITCH returns failure status. System soft- 
ware must select a different potential primary or take other appropriate action. 


13. System software executing on the selected secondary reactivates the console ter- 
minal. In particular, character reception from the terminal is reenabled. 


14. System software performs any additional system reconfiguration, updates the 
PRIMARY CPU ID field at HWRPB[82], recomputes the HWRPB checksum at 
HWRPB([288], arid performs any actions specific to system software synchroniza- 
tion. 


On a system with a detached console, the operation is similar, but only one call 
to PSWITCH is required. Additional calls to PSWITCH with the “switch primary” 
action may result in UNDEFINED operation. The operation proceeds as follows: 


1. System software performs any actions specific to system software synchroniza- 
tion. 


2. System software executing on the old primary ensures that that the console 
terminal is in a quiescent state. In particular, character reception from the 
terminal must be suspended. 


3. System software selects the new primary. The selected secondary must be eligible 
as indicated by the PE processor variation flag in its per-CPU slot. 
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4.. System software executing on any processor invokes the PSWITCH console call- 7 
back specifying the “switch primary” action and the CPU ID of the new primary. 


5. The console verifies that the selected secondary is eligible to become a primary 
and attempts to perform any necessary hardware state changes to transform the 
old primary into a secondary and to transform the selected secondary into the 
primary. : | | | 
6. Ifthe state change is completed, PSWITCH returns success status. System soft- 
ware may proceed with the primary switch at step 9. 


7. If the state change is not effected and the resulting hardware state permits a - 
return to system software, PSWITCH returns failure status. System software 
must select a different potential primary or take other appropriate action. 


8. If the state change is not effected and the resulting hardware state does not 


permit a return to system software, the console takes the action associated with 
a failed restart. | | 


9. System software executing on the selected secondary reactivates the console ter- 
minal. In particular, character reception from the terminal is reenabled. 


10. System software performs any additional system reconfiguration, updates the 


PRIMARY CPU ID field at HWRPB[32], recomputes the HWRPB checksum at 
HWRPB[288], and performs any actions specific to system software synchroniza- 
tion. 


3.4.7 Saving and Restoring console terminal state during HALT/RESTART 
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Abrupt transitions from program I/O mode to console I/O mode may occur. Such tran- 
sitions may be caused by execution of a CALL_PAL HALT instruction, a catastrophic 
error, or a console operator forcing the processor into console I/O mode. Upon tran- 
sition to console I/O mode, the console must be able to regain control of the console 
terminal, even though system software may have changed the device characteristics. 


The console may seize control of the console terminal without regard to system 
software when the transition is such that no return to program I/O mode is possible. 
Such transitions are normally associated with a catastrophic error. 


If system software execution may be continued, the console must be able to restore 
the existing state of the console terminal. The console must regain and subsequently 
relinquish control of the console terminal with the cooperation of system software. 


HARDWARE/SOFTWARE COORDINATION NOTE 
This is particularly desirable on workstations when the 
console operator forces the processor into console I/O 
mode. 


System software may provide SAVE_TERM and RESTORE_TERM routines which 
can be called by the console to save and restore the state of the console termi- 
nal. To provide these optional routines, system software loads the SAVE_TERM 
and RESTORE_TERM starting virtual address and procedure descriptor fields in 
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the HWRPB, and recomputes the HWRPB checksum at HWRPB[288]. At system 
bootstraps, the console sets these fields to zero. 


The console calls SAVE_TERM and RESTORE_TERM in kernel mode at IPL 31 
in the memory management policy established by system software. The console 
loads the routine procedure value into R27, clears R25 and R26, and then transfers 
control to system software at the starting virtual address. The procedure value 
and starting virtual address for SAVE_TERM are contained in HWRPB[224] and 
[232]; those for RESTORE_TERM are contained in HWRPB[240] and [248]. These 
routines are invoked only on the primary processor and only upon an unexpected 
entry into console I/O mode. Note that the console must preserve sufficient hardware 
state to permit the processor to be restarted prior to invoking these routines. See 
Section 3.4.4. _ 


Exit from these routines must be accomplished by using the CALL_PAL HALT in- 
struction to return the processor to console I/O mode; these routines do not use the 
RET subroutine return instruction. Prior to exit, these routines must set the “SAVE_ 
TERM/RESTORE_TERM exit” code (’1’) in the Halt Request field of the primary’s 
per-CPU slot and indicate success (’0’) or failure (’1’) status in RO<63>. The console 
will not attempt to continue system software in the event that a failure status is 
returned. 


SAVE_TERM and RESTORE_TERM may be called when system software has en- 
countered an unexpected CALL_PAL HALT or other halt condition; system state 
may be corrupt. These routines must be written with little or no dependencies on 
possibly corrupt system state. 


HARDWARE/SOFTWARE COORDINATION NOTE 
A console terminal on a serial line may or may not have 
state which needs to be saved. A console terminal on 
a workstation may require the system software to “roll | 
down” the current screen to expose the “console window” 
and “roll up” the “console window” to expose the current 
screen. 3 


3.4.7.1 SAVE_TERM - Save Console Terminal State 
Format: 


status = SAVE _TERM 


inputs: 
None 
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Outputs: 
status =R0; status: 


RO<63> ’0’ Success, terminal state saved. 
‘1’ ‘Failure, terminal state not saved. 
R0<62:0> SBZ | | 


SAVE_TERM is called by the console after an unexpected entry to console mode. The © 


routine performs any implementation-specific and device-specific actions necessary 
to save the state of the console terminal as established by system software. When 
the routine exits and console I/O mode is restored, the console is free to modify the 
existing console terminal state in any manner. | 


3.4.7.2 RESTORE_TERM - Restore Console Terminal State 


Format: 
status = RESTORE TERM 
inputs: 
None 
Outputs: 
status =RO; Status: 


RO0<63> ’0’ Success, terminal state restored 
’l’ = Failure, terminal state not restored 
R0<62:0> SBZ 


RESTORE_TERM routine is called by the console just prior to continuing system 


software. The routine performs any implementation-specific and device-specific ac- _ 
tions necessary to restore the state of the console terminal as established by system - 


software. 


3.4.8 Operator Forced Entry to Console I/O Mode 


The console operator can force a processor into console I/O mode with a HALT -CPU © 
command. When a processor enters console I/O mode in this way, the console sets — 


the Operator Halted (OH) flag in its per-CPU slot. The console does not update the 
Reason For Halt or any other processor halt state in its per-CPU slot. The console 
sets the OH flag only as the result of an explicit operator action; the OH flag is not set 
on transitions to console I/O mode resulting from error halt conditions, powerfails, 
CALL_PAL HALT instructions in kernel mode, console _— requests of a system 
crash, or software directed processor shutdowns. 


The console clears the OH flag prior to returning to program I/O mode as the result 
of a CONTINUE or BOOT command. The console may clear OH flag if an error halt 
or operator-induced condition is encountered which precludes a subsequent CON- 
TINUE command. Such a condition is treated as an error halt; see Section 3.4.4. 





3.5 Bootstrap Loading and Image Media Format 


An Alpha console may load a primary bootstrap image from one or more of the 
device classes listed in Table 3-7. \A given console implementation may support 
any combination of the devices and protocols below; see Section 3.7.6.\ Subsequent 
sections describe how the console locates, sizes, and loads the bootstrap image for 
each device class. 


Table 3-7: Bootstrap Devices and Image Media 


Device Class Data Link Protocol 
Local Disk n/a -Bootblock 
Local Tape n/a -ANSI 
| -Bootblock 
Network-like NI, -MOP 
FDDI -Bootp 
-Bootparam 
-SNMP 
-CMIP 
ROM n/a -ROM Bootblock 
Console Storage n/a -Bootblock 
-Implementation-specific 
Serial DDCMP -MOP 


As explained in Section 3.3.1.5, the console attempts to load a bootstrap image from 
each element of a bootstrap device list until a successful image load is achieved. If 
the bootstrap image cannot be located or if the load fails for any reason, the console 
retains control of the system, generates the binary error message AUDIT_BSTRAP_ 
ABORT, and then attempts to load a bootstrap image from the next bootstrap device 
list element. After a bootstrap image is successfully located and loaded, the console 
transfers control to. system software as described in Section 3.3. 


As the bootstrap image load proceeds, the console optionally generates an audit trail 
of messages indicating progress. The ENABLE_AUDIT environment variable con- 
trols audit trail generation. The audit trail begins with the AUDIT_BOOT_STARTS 
message. The audit trail continues with messages which are specific to the boot- 
strap device. All message codes generated by the console are summarized in Table 
Table 1-1; each consists of a binary message code which is interpreted by the console 
presentation layer. © 


3.5.1 Disk Bootstrapping 


An Alpha primary bootstrap may be loaded from a directly accessed disk device. 
The console loads the “boot block” contained in the first logical block (LBN 0) of the 
disk. The boot block contains the starting logical block number (LBN) of the primary 
bootstrap program and the count of contiguous LBNs which make up that image. 
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The first 512 bytes of the boot block are structured as shown in Figure 3-6. The © 
console loads the primary bootstrap without knowledge of the operating system file 
system. The boot block is (previously) initialized by the operating system. The actual 
size of a logical block is device-specific and may exceed 512 bytes. The platform- | 
specific quadword is unused by the operating system. _ 


One intended use of this quadword is to permit a given console to boot another 
console which presents a different operating system interface. This quadword is 
intended for use only on locally connected disks which are not served to multiple, 
possibly nonhomogeneous, platforms. Note that neither OpenVMS or OSEF/1 support 
this quadword. In particular, the quadword is lost at disk initialization, not written 
as a part of bootstrap block update and not replicated on a backup or archive. 


Figure 3-6: Alpha Boot Block 
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A local disk bootstrap proceeds as follows: | 
1. The console reads the boot block from LBN 0 of the specified disk device. 


2. The console validates the boot block CHECKSUM; if the checksum is not vali- 
dated, the bootstrap image load attempt aborts. The console computes the check- 
sum of the first 63 quadwords in the block as a 64-bit, 2’s complement sum ig- 
noring overflow. Note that the computation includes both reserved regions. The 
computed checksum is compared to the CHECKSUM at [BB+504]. | 


3. The console generates the AUDIT_CHECKSUM_GOOD message if the audit trail 
is enabled. 


4. The console ensures that the FLAG quadword is zero; otherwise the bootstrap 
image load attempt aborts. — 


5. The console ensures that the COUNT is non-zero; otherwise the bootstrap image 
load attempt aborts. The count field indicates the number of contiguous logical 
blocks that contain the primary bootstrap. | 


6. The console generates the AUDIT _LOAD_ BEGINS message if the audit trail is 
enabled. 
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7. The console reads the primary bootstrap image specified by COUNT and START- 
ING LBN into system memory; in the event of any error, the bootstrap image 
load attempt aborts. 


The transfer begins at the logical block given by the STARTING LBN; a con- 
tiguous COUNT number of logical blocks is read. The image is read into 
a virtually contiguous system memory buffer; the starting virtual address is 
0000 0000 2000 0000;¢. (See Section 3.3.1.3). 


Errors include device hardware errors, the specified STARTING LBN not being 
present on the disk, or unexpectedly encountering the last logical block on the 
disk during the read. 


8. The console generates the AUDIT_LOAD_DONE message when the load has 
completed; the message is generated only if the audit trail is enabled. 


9. The console prepares to transfer control to the bootstrap program as described 
in Section 3.3.1.7. 


3.5.2 Tape Bootstrapping 


3.5.2.1 


An Alpha primary bootstrap may be loaded from a directly accessed tape device. 
Prior to loading the primary bootstrap, the console must determine the tape format 
and locate the primary bootstrap on the tape. The console: 


1. Rewinds the tape on the specified tape device to the beginning of the tape (BOT). 
2. Reads the first record. 
3. Determines the record length. 


e Ifthe record length is 80 bytes, the tape may be an ANSI-formatted tape. 
The console proceeds as described in Section 3.5.2.1. 


e If the record length is 512 bytes, the tape is “boot blocked”. The console 
proceeds as described in Section 3.5.2.2. 


e Ifthe length is other than 80 or 512 bytes, the bootstrap image load attempt 
aborts. 


Bootstrapping From ANS!I-formatted Tape 


Prior to loading the primary bootstrap image from an ANSI-formatted tape, the 
console must ensure that the format is valid. To verify that a given record contains 
a particular ANSI label, the console checks for the ASCII label name string at the 
beginning of the record. For example, a record containing a VOL1 label begins with 
the ASCII string “VOL1”. All other record bytes are ignored when verifying the 
label. 


A primary bootstrap image filename may be specified explicitly on a BOOT command 
or implicitly by the BOOT_FILE environment variable. If no filename is specified, 
the first located file will be used. 


A local ANSI-formatted tape bootstrap proceeds as follows: 
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The console verifies that the first record contains a VOL1 label; 7 the verification 
fails, the bootstrap image load attempt aborts. 


The console generates the AUDIT_ TAPE_ANSI message if the audit trail is en- 
abled. 


If no filename was specified, tie console advances the tape position to the End- 
Of-Tape (EOT) side of the the first tape mark. The console proceeds to step 
5. | oe 


If a filename was specified, the console attempts to locate that file on the tape. If 
the file cannot be located, the bootstrap image load attempt aborts. The console 

compares the specified filename with the filename present in each HDR1 label 
on the tape. At the first match, the console proceeds to step 5. 


The console searches for the specified file starting with the second tape record. 


The console reads 80-byte records from the tape until it encounters an HDRI1 
label, then proceeds as follows: 


a. The console generates the AUDIT_FILE_ FOUND<filename> message, where 
<filename> is the value of the HDRI1 label. The message is generated only if 
the audit trail is enabled. 


b. The console compares the specified filename with the 17 character File Iden- 
tifier Field found in the HDR1 label. 


c. Ifa match occurs, then the console advances the tape position to after the next 
tape mark and proceeds to step 5. (Any HDR2 or HDR3 labels are ignored.) 


d. If there is no match, then the console advances the tape position over the 
next three tape marks and reads next the record. If another tape mark is 
found, then the logical end of volume has been encountered and the bootstrap 
image load attempt aborts. Otherwise the record should be the HDR1 label 
for the next file on the tape and the console proceeds at step a. 


The console aborts the bootstrap image load attempt whenever an unexpected 
tape mark is encountered, the tape runs off the end, or a hardware error occurs. 


The console generates the AUDIT_LOAD _BEGINS message if the audit trail is 
enabled. 


The console reads the primary bootstrap image from tape into system memory; 
in the event of any error or if the tape runs off the end, the bootstrap image load 
attempt aborts. 


The transfer from tape begins at the current tape position and continues until 
a tape mark is encountered. The image is read into a virtually contiguous sys- 
tem memory buffer; the starting virtual address is 0000 0000 2000 00001.. (See 
Section 3. 3. 1.3). 


7. The console checks that the bootstrap file was properly closed by: 


a. Reading the record after the tape mark and verifying that the record is an 
EOF! label. If not, the bootstrap image load attempt aborts. 
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8. 


b. Searching for a subsequent tape mark. If one is not found, the bootstrap file 
was improperly closed and the bootstrap image load attempt aborts. (Any 
EOF2 and EOFS labels are ignored.) 


The console generates the AUDIT_LOAD_DONE message if the audit trail is 
enabled. 


The console prepares to transfer control to the bootstrap as described in Sec- 
tion 3.3.1.7. Note that the console does not rewind or otherwise change the 
position of the tape after reading the bootstrap image. 


3.5.2.2 Bootstrapping from Boot Blocked Tape 


Bootstrapping from a boot blocked tape is similar to the iseal disk bootstrapping 
described in Section 3.5.1. The first tape record must be 512 bytes, and must follow 
the format given for disk boot blocks as shown in Figure 3-6. The STARTING LBN 
and FLAGS fields are MBZ for tape boot bootblocks. 


All tape records which comprise the primary bootstrap must be 512 bytes in size. If 
the console encounters records of any other size, the bootstrap image load attempt 
aborts. 


A local tape boot block bootstrap proceeds as follows: 


1. 


The console generates we AUDIT_ TAPE_ BBLOCK message if the audit trail is 
enabled. 


The console validates the boot block CHECKSUM; if the checksum is not vali- 
dated, the bootstrap image load attempt aborts. The console computes the check- 
sum of the first 63 quadwords in the block as a 64-bit, 2’s complement sum ig- 
noring overflow. Note that the computation includes both reserved regions and 
the MBZ fields. The computed checksum is compared to the CHECKSUM at 
[BB+504]. | 


The console generates the AUDIT_CHECKSUM_GOOD message if the audit trail 
is enabled. 


The console ensures that the COUNT is non-zero; otherwise the bootstrap image 
load attempt aborts. The count field indicates the number of subsequent 512 
byte records that contain the primary bootstrap. 


The console generates the AUDIT_LOAD_BEGINS message if the audit trail is 
enabled. 


The console reads the COUNT subsequent records from the tape into system 
memory. The bootstrap image load attempt aborts if the console encounters any 
error, encounters any record size other than 512 bytes, or the tape runs off the 
end. 


The image is read into a virtually contiguous system memory buffer; the starting 
virtual address is 0000 0000 2000 0000,,. (See Section 3.3.1.3). 


The console generates the AUDIT_LOAD_DONE message if the audit trail is 
enabled. 
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8. The console prepares to transfer control to the bootstrap as described in Sec- — 
tion 3.3.1.7. Note that the console does not rewind or otherwise change the 
position of the tape after reading the bootstrap image. ; | 

3.5.3 ROM Bootstrapping | 


An Alpha console may support bootstrapping from Read Only Memory (ROM). Boot- | 
strap ROM is assumed to appear in multiple discontiguous regions of the physical 
address space. A given ROM region may contain multiple ceca images. A given 
' bootstrap image must not span ROM regions. | 


Each ROM bootstrap image is page aligned and iia’ with a boot block as shown: 
in Figure 3-7. The ROM boot block is similar to the local disk and tape boot block 
shown in Figure 3-6. 


Figure 3~7: Alpha ROM Boot block 
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A ROM bootstrap proceeds as follows: 


1. The console locates the specified ordinal ROM bootstrap image; if the bootstrap 
image cannot be located, the bootstrap image load attempt aborts. | 


The console locates the ROM bootstrap image by searching ROM regions be- 
ginning with the ROM region with the lowest physical address and proceeding 
upward to the ROM region with the highest physical address. 


The search proceeds as follows: 
a. The console verifies that the page contains a ROM bootstrap image: 
¢ The low-order byte of the first quadword must be 804¢. 


¢ The high-order longword of the first quadword must be the one’s comple- 
ment of the low-order longword. 


© The sixth quadword must contain the checksum of the first five quad- 
words. The checksum is computed as a 64-bit, 2’s complement sum ignor- 
ing overflow. 





b. The console generates the AUDIT_BOOT _'TYPE<string> message for each 
valid bootblock, if the audit trail is enabled. The <string> is the ISO-LATIN- 
1 string contained in the BOOTSTRAP ID quadword. 


¢. Ifthe specified ordinal image number has been reached, the console proceeds 
to step 2. , 


d. Otherwise, the console uses the IMAGE LENGTH at [BB+24] to datennins 
the offset to the next ROM region page to be searched. The console repeats 
the process at step a. | 


2. The console computes the starting physical address of the bootstrap image by 
adding the physical address OFFSET at [BB+16] to the starting physical address 
of the bootblock [BB]. 


3. The console verifies the accessibility of each page of the bootstrap image. If any 
page is inaccessible, the bootstrap image load attempt is aborted. 


4. The console generates the AUDIT_BSTRAP_ACCESSIBLE message if the audit 
trail is enabled. 


5. If requested, the console validates the IMAGE CHECKSUM; if the checksum is 
not validated, the bootstrap image load attempt aborts. The console computes 
the checksum of all quadwords in the bootstrap image as a 64-bit, 2’s complement 
sum ignoring overflow. The existence and implementation of the mechanism for 
requesting this validation is implementation-specific. 


6. The console generates the AUDIT_BSTRAP_GOOD message if the audit trail is 
enabled. 


7. Ifrequested, the console copies the bootstrap image from ROM into system mem- 
ory (RAM). The image is copied into a virtually contiguous buffer starting at vir- 
tual address 0000 0000 2000 0000,.. (See Section 3.3.1.3). The console generates 
the AUDIT_LOAD_BEGINS message before beginning the copy and the AUDIT_ 
LOAD_DONE after the copy completes successfully if the audit trail is enabled. 


8. The console prepares to transfer control to the bootstrap as described in Sec- 
~~ tion 3.3.1.7. , 
3.5.4 Network Bootstrapping 


An Alpha system may support bootstrapping over one or more network communi- 
cation devices and data link protocols. The console actions are dependent on the 
network device, data link protocol, and remote server capabilities. 


3.5.4.1 MOP-based Network Booting 


An Alpha system can use the Digital Network Architecture Maintenance Operations 
Protocol to bootstrap an Alpha system; see the MOP specification for a detailed 
description. 


The MOP bootstrap proceeds as follows: 


1. The console determines if a bootstrap filename is to be used. The filename is 
taken from the BOOT command or the BOOT_FILE environment variable. If no 
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filename is specified on the BOOT command and mn is null, no filename - 
will be used. 


The console generates the AUDIT_BOOT | REQ<filename> mnceeage if the audit 
trail is enabled. 


3. The console issues the appropriate MOP bootstrap request sneeeauete) 


The console receives an appropriate MOP response from a remote bootstrap 
server. If no such response is received, the bootstrap image load attempt aborts. 


The console generates the AUDIT_ BSERVER_ FOUND message if the audit trail 
is enabled. : 


The bootstrap load proceeds following the MOP seaenid 


7. When the console receives the first portion of the bootstrap image, the console 


10. 


generates the AUDIT_LOAD_BEGINS message if the audit trail is enabled. 


The console loads the initial portion of the bootstrap image into a virtually con- 
tiguous system memory buffer; the starting virtual address is 0000 0000 2000 0000;.. 
(See Section 3.3.1.3). 


When the bootstrap image has been loaded, the console generates the AUDIT_ 
LOAD_DONE message if the audit trail is enabled. 


The console prepares to transfer control to the bootstrap program as described 
in Section 3.3.1.7. 


In the event of any error, the bootstrap image load attempt aborts. 


3.5.4.2 BOOTP-UDP/IP Network Booting 
TBD. 


3.6 BB_WATCH 
The following lists important points about BB_WATCH: 


1. 


\ BB_WATCH is the correct name for this entity. Although incorrect terminology, 
T-O-Y, T-O-D-R, toy, todder, and watch chip when used in an Alpha context are 
equivalent in meaning to the BB_WATCH.\ 

System software must directly manipulate the BB_WATCH through an implementation- 
dependent interface. 


System software makes the decision where to acquire known time; if a BB_ 
WATCH is present, it may be used as the provider of known time. 


Systems are not required to have a: BB_WATCH. 


SOFTWARE NOTE 
However, all systems that support OpenVMS Alpha 
or OSF/1 on Alpha must have one. 





If a BB_WATCH is present in a system, it meets the following requirements: 


e it has an accuracy of at least 50 ppm regardless of whether power is applied 
to the system; 


e it has a resolution of at least 1 second (That is, it is read and written in units 
of a second or better). 


e changing the entirety of the time maintained by the BB_WATCH takes under 
1 second; and | 


e it has battery backup to survive the loss of power. 


. A BB_WATCH is always accessible to the primary processor. Or stated another 
way, a processor must be able to access a BB_WATCH directly (i.e., not needing 
to go through another processor to get at it) in order to be a candidate for primary 
processor. 


The number of BB_WATCHes in a system is either one for the entire system or 
one per each processor in the system; which of the two options a system chooses 
is implementation-dependent. If the latter option is chosen (one BB_WATCH per 
each processor), note that writing one BB_WATCH does not update another. 


. Although writing the BB_WATCH takes less than one second, it may not be a 
fast operation. Software should avoid frequently writing the BB_WATCH lest it 
negatively impact performance. 


The processor and its PALcode never changes the value of BB_WATCH except 
under the direction of system software. (Note: the console, boot programs, and 
remote console clients are not system software.) The console, its PALcode, and 
any console application (including a diagnostic supervisor) never changes BB_ 
WATCH except under the direction of the console operator — even when the CPU 
is HALTED, the processor is being initialized, or the BB_WATCH has an invalid 
time. | | 


SOFTWARE NOTE 
The format of time representation in the BB_WATCH 
may vary from implementation to implementation. The 
architecture requires, wherever possible, that when sys- 
tem software writes a time value into the BB_WATCH, 
the format of the time must conform to that of the 64- 
_ bit Absolute Time field of the 128-bit Digital Time Ser- 
vice Standard (DTSS) Binary Time field, as described 
in A-DG-ELEN112-00-0, Rev. A, 30-Jul-1987, which is 
available from Digital Standards and Methods Control. 
This absolute time format indicates the number of 100 
- nanosecond units that have elapsed since midnight Oc- 
tober 15, 1582 UTC, the beginning of the Gregorian re- 
form. Since the absolute time format is based on Coordi- 
nated Universal Time (UTC, popularly known as Green- 
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wich Mean Time or GMD), it does not include a local time | 
offset. | . 


 IfDTSS soniermaes is not possible in a particular BB_ 
_ WATCH, system software must at least use UTC based 
time when writing the BB_WATCH. | 


This is a pure and simple constraint on the operating 
systems that use an Alpha system. It prevents an op- 
erating system from updating the BB_WATCH in an in- 
compatible way with a subsequently booted operating 
system on the same machine. 


This requirement is waivered for OpenVMS Alpha until 
no later than the first release of OpenVMS Alpha se 
1-Jan-1995, when it will then somply 


| PROGRAMMING NOTE 
The Primary-Elegible (PE) bit in the per-cpu slot of 
the HWRPB for each processor indicates, among other 
things, whether the CPU has access to a BB_WATCH. 
See Appendix D. 


The description of primary switching details the actions 
taken in a multiprocessor system, including the require- 
ment for the primary processor to have access to the BB_ 
WATCH. 


3.7 Implementation Considerations 


3./.1 Memory Sizing, Testing, and Memory Data Descriptor Table 


Alpha systems are allowed to have holes of unimplemented physical memory. The 
cluster mechanism allows all of available memory to be described in such a system 
without the need for creating bitmaps for unimplemented physical memory. 


Every implementation cannot be required to test all of memory before booting the 
operating system. Partial memory testing is recommended whenever testing is time 
consuming and would significantly delay the bootstrapping process; the choice is 
implementation-specific. The highwater mark mechanism allows implementations 
to completely size memory without testing all of it and indicate to the operating 
system where testing ended. 


This is the rationale for flagging pages that test as having Corrected Read Data © 
errors as bad pages. 


1. Pages which have hard (repeatable) or soft (transient) CRD errors must be re- 
ported as bad so that operating systems have the option of implementing a user- 
directed policy over the use of these pages. For example, OpenVMS Alpha cus- 
tomers with critical applications may want the operating system to use only 
pages that test absolutely good. | 
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2. Determining whether a page that tests as a CRD page is really a CRD page or 
an RDS page is potentially a time consuming operation. A page of this type must 
have each bit held in a known state and all others put through a one to zero and 
zero to one transition to determine if the page is CRD or RDS. Flagging CRD 
pages as bad, frees the console from doing this extensive testing and potentially 
speeds-up the booting process. The operating system can bury this testing time 
with other tasks after it has been booted. | 


3. Typically the time between writing a test pattern to memory and reading it back 
is on the order of microseconds. The probability is low that a transient CRD 
error has occurred in this short time. Thus, pages testing as having CRD errors, 
probably have hard CRD errors and it is not efficient checking these CRD pages 
for the few times where the error is actually a transient error. 


In some Alpha systems, it is expected that the console will attempt to partition 
physical memory into two clusters—one for the console and one for the operating 
system—and that all pages in the operating system cluster will be tested. Again, 
console implementations are strongly discouraged from testing all of memory if the 
booting process is significantly delayed. 


Clusters reserved for console and PALcode use do not have associated bitmaps. If 
such a cluster would contain a large number (3 or more) of contiguous pages which 
encounter soft read errors or are otherwise unsuitable for console and PALcode, the 
console should consider breaking the bad pages into a separate cluster. This cluster 
should be made available for use by system software which can possibly reclaim the 
pages for use. 


The PALcode function for flushing at least one page to memory (CFLUSH) may be 
used to aid in implementation of this system software function. (CFLUSH takes one 
argument, the PFN of the physical page to flush.) 


The console does not alter the Memory Data Descriptor Table or any bitmaps across 
warm bootstraps. This permits system software to propagate information on system 
software memory testing and intermittent errors across operating system bootstraps. 
For example, system software could set the “bad” bit of a page which incurred re- 
peated CRD errors. 


3.7.2 Bootstrap Flags 


The console uses the BIP and RC flags to detect failed bootstraps, starts, and restarts. 
The default response of the console is take the least drastic action possible. The 
console attempts a restart in preference to a bootstrap and attempts a bootstrap in 
preference to remaining in console I/O mode. 


BIP and RC are shared between the console and system software. There are two 
improbable cases of seemingly extraneous bootstrap attempts: 


1. Repeated power failures caused by a bouncing power supply. 
System software may not have sufficient time to set the RC flag. 


2. Intermittent hardware failures on a secondary processor. 
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The console executing on iis sscoudary.a may force a oe bootstrap due toa 
failed restart of the secondary. 7 


3.7.3 Embedded console 


3.7.3.1 


In an embedded console implementation, the console executes on the same processor | 
as the operating system. In such an implementation, the state transitions as expe- 
rienced by the processor are more conceptual. For example, the processor acting as 
the console will be executing instructions when in the halted state. The processor 
may also field console I/O mode exceptions and interrupts. 


An embedded console may be implemented as an extension of PALcode or as a dis- 
tinct software entity. The console may execute from dedicated RAM or ROM on the 
processor or, after console initialization, may execute from main memory. 


An embedded console implementation must include a mechanism by which the pri- 
mary processor can be forced into console I/O mode from program I/O mode. This 
enables the console operator to gain control of the system regardless of the state of 
the system software. See Section 1.2 for recommended and required mechanisms. 


Multiprocessor considerations 


In a multiprocessor system, selection of the primary processor occurs prior to any 
access to main memory by any of the processors. At system cold start, each of the 
processors will be executing in console I/O mode. The necessary memory for console 
execution must be independent of main memory; the console must be executing from — 
dedicated console RAM or ROM and/or a suitably configured processor cache. 


The selection of the console primary requires one or more hardware registers with 
state which is shared by all processors. One possible example is a mutex contained 
in a single-bit register accessed only with LDQ_L/STQ_C instructions. The primary 
successfully gains ownership of the mutex. Note that implementations should in- 
clude mechanisms for operator override of the selection process and for recovery in 
the event that the selection process fails. | 


Once a console primary has been selected, the console secondaries take no further 
action until appropriately notified by the primary. In particular, console secondaries 
must not access main memory. The console primary has the responsibility of build- 
ing the HWRPB and any console-internal data structures (such as environment vari- 
ables) for the secondaries. When these structures have been initialized, the console 
primary must be able to signal one or more of the secondaries by additional hardware 
register(s). 


The console primary allocates a HWRPB in main memory, initializes it, and stores 
its physical address in an implementation-specific non-volatile manner. The console 
primary then indicates the presence of the HWRPB and its location to all secondaries 
by an implementation-specific mechanism. 


On system restarts, the console primary identifies itself by comparing its WHAMI 
register contents with the Primary CPU ID value stored in the HWRPB. 


When executing in console I/O mode, aii processors must observe the same values 


of all console environment variables. Of particular importance are the values of the 





AUTO_ACTION and BOOT_RESET environment variables. After failing to become 
the console primary processor, a console secondary waits to be notified that a valid 
HWRPB exists. Upon such notification by the primary, the console secondaries use 
the address provided by the primary to locate the HWRPB. The primary may be in 
either program I/O mode or console I/O mode. 


On cold bootstrap, a console secondary must not access main memory until notified 
by the primary that a valid HWRPB exists. Thus, there must exist a non-main 
memory based mechanism by which the primary may signal each of the secondaries. 
On warm bootstrap or restart, a secondary processor must locate its per-CPU slot 
in the HWRPB and poll its RXRDY bit. 


Console processors must locate the HWRPB without. searching memory; such a 
search constitutes a security hole. One possible implementation is to use an environ- 
ment variable or other shared console data structure. The address of the HWRPB 
must be non-volatile across power failures in systems which support powerfail re- 
covery. 


Console implementations which support SAVE_ENV must be capable of executing 
the routine simultaneously on each processor. System software use of SAVE_ENV 
requires care. System software must invoke SAVE_ENV on all available processors, 
but cannot ensure that the non-volatile storage is updated on processors which are 
not available at the time of update. In the event of mismatch, the console uses the 
non-volatile values preserved by the primary processor. 


3.7.4 Detached console 


In a detached console implementation, the console executes on a separate and dis- 
tinct hardware platform. A detached console may have cooperating special code 
which executes on one of the processors in the system configuration. 


Detached console implementations should provide some sort of keep-alive function. 
System software should be able to detect failures of the path between the system 
platform and the console. This may be a single dedicated signal or may be periodic 
message exchange. System software should be capable of continuing to execute in 
the event of a keep-alive failure and restoration of the connection (or console state) 
should not cause a system crash or other major state transition. The console should 
buffer any messages in the event of a keep-alive failure until reconnection occurs. 


Detached consoles may maintain a local console log. The logging device and format 
are implementation-specific. 
3.7.5 Goals of the Bootstrap Address Space 


The bootstrap address space established by the console for seecitiie the primary 
bootstrap is specifically tailored to address the —_ and needs of system software 
supported by Alpha, as listed here: 


e The address space cannot exceed the reach of our supported implementation 
languages. In particular, page table address space must be reachable. 


¢ The address space layout should not create conflicts for system software. The 
immediate addressing needs of system software must be accommodated. 
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e Page table simplicity is desirable, but not to the extent that Acie translation. : 
paths are created. 


e The address space layout must be eee to ensure hak the previous ae | 
are met and to ensure a growth path for future console and primary bootstrap © 
needs. | 


° The address space layout should not ppeciade bootstrapping an operating system 
which supports full 64-bit addressing. 


Several alternatives were considered for implementing a bootstrap address space. 
One scheme that was considered involved having a single, triply-mapped page table. 
This scheme introduced address space conflicts which unnecessarily placed imple- 
mentation restrictions on the primary bootstrap program. A variation of this which 
created additional page tables, all of which were naturally located in virtual memory, | 
eliminated the bogus translation paths but didn’t solve the address space conflicts 
created for the primary bootstrap program. 


The chosen design solves all of these problems through careful location of page table 
address space. The location of page table address space naturally excludes the level 
1 page table from virtual memory, but this is not a problem for software. The chosen 
design incurs no additional page table complexity or memory usage over any triply- 
mapped scheme that doesn’t also introduce bogus translation paths. 


3.7.5.1 Address Space must be reachable 


There exists system software (OpenVMS Alpha Phase 1) which is piemented using 
32-bit oriented languages. Such software is limited to a 32-bit address space subset | 
modeled after the VAX address space and supported by Alpha longword arithmetic 
operations. This is not to say that the remainder of the Alpha virtual address space 
is inherently unavailable, only that the software implementation language imposes 
a restriction on the amount of the Alpha address space that can be reached by that 
particular software. 


A requirement immediately emerges that the bootstrap address space in which sys- 
tem software executes must be “32-bit oriented”. Valid potential bootstrap address 
space can only consist of the first and last 2GB (due to longword sign-extension) of | 
the Alpha 64-bit virtual memory space. | 


3.7.5.2 The coarseness effect 


A triply-mapped page table scheme of any kind imposes extreme coarseness upon the 
location of page table space. Consider Table 3-8, which shows the locations of page 
table space for a triply-mapped page table using different L1PTEs for self-mapping: 


Table 3-8: Page Table Coarseness Effect 


LIPTE , | 
Number 8KB 16KB  32KB 64KB 
0 0 0 0 0 





Table 3-8 (Cont.): Page. Table Coarseness Effect 


L1PTE 
arp sd 
1 8GB 64GB 0.5TB ATB 

2 16GB 128GB 1.0TB  8TB 

3 24GB 192GB 1.5TB 12TB 

4 32GB 256GB 2.0TB 16TB 


Last 2**64-8GB 2**64-64GB 2**64-0.5TB 2**64_4TB 


Self-mapping in any LIPTE other than the first LIPTE would locate page table 
space at an address that a 32-bit oriented language cannot reach. 


3.7.5.3 Address Space must not create conflicts 
3.7.5.3.1 Location of Page Table Space 


As was noted above, the only reachable location for page table sadvesd space utilizes 
the first LIPTE, thus locating page table address space at a region beginning at 
address zero and extending at least to address 8GB-1. This creates an immediate 
addressing conflict since no reachable address space is left over for system software 
itself or for console-mapped structures and code. 


A finer grained virtual address layout is therefore required, one in which the self- 
mapping that establishes reachable page table address space is done at page table 
level 2 instead of level 1. A level 1 page table would exist which is entirely empty 
except for the first LIPTE. The first LIPTE would point to a separate level 2 page 
table. A PTE within the level 2 page table would be used for self-mapping, thus 
locating page table address space at a finer grained location (within the total address 
space mapped by the single LIPTE) than would be otherwise possible. With this 
approach, page table space could be located within the entire 64-bit address space 
as shown in Table 3-9. 


Table 3-9: Page Table Space Location | 
Page Size 


LIPTE / L2PTE 

Numbers 8KB 16KB 32KB 64KB 
0/0 0 0 0 0 

0/1 8MB 32MB 128MB 0.5GB 


0/2 16MB 64MB 256MB 1.0GB 
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Table 3-9 (Cont.): Page Table Space Location 


7 Page Size 
LIPTE /L2PTE 
Numbers 8KB 16KB 32KB 64KB 
0/3 | 24MB 96MB 384MB 1.5GB 
0/4  32MB 128MB 512MB 2.0GB 


This table shows that self-mapping using any of the first four L2PTEs will define a 
reachable location for page table address space (anywhere in the first 2GB), regard- 
less of page size. 


Table 3-10 shows the size of page table address space as a function of page size. 
Any space in the first 2GB of virtual memory that is not part of page table address 
space is available for other uses. 


Table 3-10: Page Table Address Space as Function of Page Size 
Page Size Length of Page Table Space 


SKB 8MB 
16KB 32MB 
32KB 128MB 
64KB 512MB 


Self-mapping at level 2 naturally excludes the L1 page table from the defined page 
table address space. Self-mapping at level 2 merely establishes an address space 
within the context of whatever LIPTE is used to map the level 2 page table. 


Either the level 1 page table can be mapped to some arbitrary, yet architected, VA 
outside of page table address space, or it can be left unmapped by the console, or 
another address space can be created through self-mapping at level 1 which would 
naturally include the level 1 page table. The need to support virtual PTE lookup 
during Translation Buffer miss processing dictates the third choice. 


Thus the second L1PTE is used to map the level 1 page table itself. Note from the 


discussions above, this creates a second address sapce for the page tables which is 


not reachable from 32-bit oriented software. Such software will use the finer grained 


page table space created by the self-map technique at level 2. 


3.7.5.3.2 Laying out the first 2GB 


Bootstrap address space can be laid out once a location is chosen for page table space. 
The four natural locations for page table space that would be expressible regardless 
of page size are found in the column above for the 64KB page size. These locations 


Fe ee en 


are 0, 0.5GB, 1.0GB and 1.5GB. After reserving location zero for software use, any 


one of the remaining three locations could be chosen. Remaining address space in 
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the first 2GB could then be allocated for other purposes. The final layout of the first 
2GB of address space is described in Section 3.3.1.3. 


3.7.5.4 Conclusion 


The needs of more restrictive implementation languages can be met by utilizing the 
natural flexibility of the Alpha multi-level page tables. This can be done without 
undue complexity or memory usage, and without precluding the use of any less 
restrictive language used to implement a ’64-bit’ operating system. 


3.7.6 Bootstrap Devices and Image Media 


Various factors should be considered when determining which of the bootstrap de- 
vices and image media listed in Table 3-7 are supported. 


1. Workstations and other low-end platforms may consider supporting ROM boot- 
blocks for DEC OSF/1 and customer applications. OpenVMS Alpha currently 
uses a single bootstrap image for all platforms; support for ROM bootstrapping 
will require customization. See Section 3.5.3 for the eee ROM Bootblock mech- 
anism. 


2. Platforms considering bootstrap media which is local to the console must nego- 
tiate with the operating systems for such support on a case-by-case basis. DEC 
OSF/1 supports the bootblock method; see Section 3.5.1. 


3. Products intended for embedded systems applications should consider DDCMP 
/MOP support. 


Support for audit trail generation during console bootstrap is strongly recommended 
to all implementations. An audit trail is essential to the isolation of errors during 
the bootstrap process. Section 3.5 give the architected audit trail for each bootstrap 
device. Console implementations may generate additional audit trail messages. 


3.7.6.1 Disk Bootstrapping 


Note that unlike the VAX boot block support, NO code is contained in the boot block; 
the boot block contains ONLY the LBN descriptor for the Alpha primary bootstrap 
image. Also note that an Alpha boot block can contains pointers to primary bootstrap 
images for both VAX and Alpha simultaneously. 


Because the boot block includes an LBN and block count, the console need have no 
knowledge of the operating system file system or on-disk structure. 


The first 186 bytes of the boot block are currently used by the VAX disk boot block 
mechanism. The next 80 bytes are not currently used either by VAX or Alpha boot © 
blocks. For future expansions, VAX boot blocks should expand towards higher ad- 

_ dresses, and Alpha boot blocks expand towards lower addresses; each region remains 
contiguous. These 216 bytes are ignored by the Alpha console except for the purposes 
of computing the bootblock checksum. 


The boot block FLAGS word is reserved for future expansion. Flag<0> is reserved 
to indicate a discontiguous bootstrap image; Flag <63:1> are reserved for future 
definition. There are no current plans by any Digital operating system to mae a 
discontiguous primary bootstrap image. 
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Restricted | 





3. 7. 6.2 ROM Bootstrapping 7 | 
| _A ROM block is uniquely identified as containing an Alpha bocteeran image by the 


3.7.6.3 
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value of 0080;¢ in the first word of the block. Each ROM aL image is —— | 
identified by a zero-based ordinal number. | 


The size of the ROM bootstrap is specified i in bytes to permit the same ROM bectstnep | 
image to be used by systems with different page sizes. 


Other alternatives to specify which ROM bootstrap to use were sonaaeied: such as 
making the console operator give the physical address of the ROM bootstrap. The 
specified method seemed the least complicated. A console implementation should 
consider a command which permits the location of ROM bootstraps, their IDs, and 
PR assignments to be displayed. 


Note that the specified searching process ensures that bniiasiaie created ROM boot- 
blocks are ignored by the console. 


Network Bootstrapping | 


Data link protocols include CSMA/CD (IEEE 802.3 and Ethernet), Token-passing 
Bus (IEEE 802.4), Token-ring (IEEE 802.5), HDLC, and DDCMP. It is strongly rec- 
ommended that a console implementation support both BOOTP-UDP/IP and MOP 
protocols over all supported network devices and data links. 





3.8 \REVISION HISTORY 
Revision 5.0, May 12, 1992 
Removed references to ELN 
ULTRIX -> DEC OSF/1 
Widget —> device 
Added eco #30 text part 
Material rearranged according to SRM Rev 5 requirements © 
Added ECO #17, #23 
Converted to SDML. | 
Replace previous Console Chapter with Console ECO #15 


See: ot Ss So Pak. 


Includes 3 chapters and two appendices, renumber I/O Chapter 
10. Material substantially changed or rearranged 
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_ The following appendixes are included in the Alpha System Reference Manual: 
e Appendix A, Software Considerations | 
e Appendix B, IEEE Floating-Point Conformance 
¢ Appendix C, Instruction Encodings 
¢ Appendix D, Registered System and Processor Identifiers 
¢ Appendix E, Registered Console Implementation Functions 
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Appendix A 
Software Considerations 


A.1 Hardware-Software Compact 


The Alpha architecture, like all RISC architectures, depends on careful attention to 
data alignment and instruction scheduling to achieve high performance. 


Since there will be various implementations of the Alpha architecture, it is not 
obvious how compilers can generate high-performance code for all implementations. 
This chapter gives some scheduling guidelines that, if followed by all compilers and 
respected by all implementations, will result in good performance. As such, this 
section represents a good-faith compact between hardware designers and software 
writers. It represents a set of common goals, not a set of architectural requirements. 
Thus, an Appendix, not a Chapter. 


Many of the performance optimizations discussed below are advantageous only for 
frequently executed code. For rarely executed code, they may produce a bigger 
program that is not any faster. Some of the branching optimizations also depend on 
good prediction of which path from a conditional branch is more frequently executed. 
These optimizations are best done by using an execution profile, either an estimate 
generated by compiler heuristics, or a real profile of a previous run, such as that 
gathered by PC-sampling in PCA. 


Each computer architecture has a “natural word size.” For the PDP-11, it is 16 bits; 
for VAX, 32 bits; and for Alpha, 64 bits. Other architectures also have a natural word 
size that varies between 16 and 64 bits. Except for very low-end implementations, 
ALU data paths, cache access paths, chip pin buses, and main memory data paths 
are all usually the natural word size. | 


As an architecture becomes commercially successful, high-end implementations 
inevitably move to double-width data paths that can transfer an aligned (at an even 
natural word address) pair of natural words in one cycle. For Alpha, this means 
eventual 128-bit wide data paths. It is hard to get much speed advantage from paired — 
transfers unless the code being executed has instructions and data appropriately 
aligned on aligned octaword boundaries. Since this is hard to retrofit to old code, 
the following sections sometimes encourage “over-aligning” to octaword boundaries 
in anticipation of high-speed Alpha implementations. 


In some cases, there are performance advantages in aligning instructions or data 
to cache-block boundaries, or putting data whose use is correlated into the same 
cache block, or trying to avoid cache conflicts by not having data whose use is 
correlated placed at addresses that are equal modulo the cache size. Since the 
Alpha architecture will have many implementations, an exact cache design cannot 
be outlined here. Nonetheless, some expected bounds can be stated. 
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| 1. Small (first-level) cache sizes will likely be in the range 2 KB to 64 KB 
2. Small cache block sizes will likely be 16, 32, 64, or 128 bytes 


. Large (second- or third-level) cache sizes will likely be in the range 128 KB to 
8 MB 


4, Large cache block sizes will likely be 32, 64, 128, or 256 bytes 
5. TB sizes will likely be in the range 16 to 1024 entries 


Thus, if two data items need to go in different cache blocks, it is desirable to 
make them at least 128 bytes apart (modulo 2 KB). Doing that creates a high 
probability of allowing both items to be in a small cache elm eneously; for all 
Alpha implementations. 


In each case below, the pariomineate implication is given by an order-of-magnitude 
number: 1, 3, 10, 30, or 100. A factor of 10 means that the performance difference 
being discussed will likely range from 3 to 30 across all Alpha implementations. 


A.2 Instruction-Stream Considerations 
The following sections describe considerations for the instruction stream. 


A.2.1 Instruction Alignment 


Code PSECTs should be octaword-aligned. Targets of frequently taken branches 
should be at least quadword-aligned, and octaword-aligned for very frequent loops. 
Compilers could use execution profiles to identify frequently taken branches. 


Most Alpha implementations will fetch aligned quadwords of instruction stream (two 
instructions), and many will waste an instruction-issue cycle on a branch to an odd 
longword. High-end implementations may eventually fetch aligned octawords, and 
waste up to 3 issue cycles on a branch to an odd longword. Some implementations 
may only be able to fetch wide chunks of instructions every other CPU cycle. 
Fetching four instructions from an aligned octaword can get at most one cache miss, 
while fetching them from an odd longword address can get 2 or even 3 cache misses. 


Quadword I-fetch implementors should give first priority to executing aligned 
quadwords quickly. Octaword-fetch implementors should give first priority to 
executing aligned octawords quickly, and second priority to executing aligned 
quadwords quickly. Dual-issue implementations should give first priority to issuing 
_both halves of an aligned quadword in one cycle, and second priority to onaeEe 
and i issuing other combinations. 


A.2.2 Multiple Instruction issue — Factor of 3 


Some Alpha implementations will issue multiple instructions in a single cycle. To 
improve the odds of multiple-issue, compilers should choose pairs of instructions to 
put in aligned quadwords. Pick one from column A and one from column B (but only 
a total of one load/store/branch per pair). 
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Column A Column B 


Integer Operate Floating Operate 

Floating Load/Store ‘Integer Load/Store 

Floating Branch Integer Branch 
BR/BSR/JSR 


Implementors of multiple-issue machines should give first priority to dual-issuing at 
least the above pairs, and second priority to multiple-issue of other combinations. 


In general, the above rules will give a good hardware-software match, but compilers 
may want to implement model-specific switches to generate code tuned more exactly 
to a specific implementation. 


A.2.3 Branch Prediction and Minimizing Branch-Taken — Factor of 3 


In many Alpha implementations, an unexpected change in I-stream address will 
result in about 10 lost instruction times. “Unexpected” may mean any branch-taken 
or may mean a mispredicted branch. In many implementations, even a correctly 
predicted branch to a quadword target address will be slower than straight-line 
code. 


Compilers should follow these rules to minimize unexpected branches: 


1. Implementations will predict all forward conditional branches as not-taken, 
and all backward conditional branches as taken. Based on execution profiles, 
compilers should physically rearrange code so that it has matching behavior. 


2. Make basic blocks as big as possible. A good goal is 20 instructions on average 
between branch-taken. This means unrolling loops so that they contain at least 
20 instructions, and putting subroutines of less than 20 instructions directly in 
line. It also means using execution profiles to rearrange code so that the frequent 
case of a conditional branch falls through. For very high-performance loops, it 
will be profitable to move instructions across conditional branches to fill otherwise 
wasted instruction issue slots, even if the instructions moved will not always do 
useful work. Note that the Conditional Move instructions can sometimes be used 
to avoid breaking up basic blocks. 


8. In an if-then-else construct whose execution profile is skewed even slightly away 
from 50%-50% (51-49 is enough), put the infrequent case completely out of line, 
so that the frequent case encounters zero branch-takens, and the infrequent case 
encounters two branch-takens. If the infrequent case is rare (5%), put it far 
enough away that it never comes into the I-cache. If the infrequent case is 
extremely rare (error message code), put it on a page of rarely executed code and 
expect that page never to be paged in. 
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4, There are two functionally identical branch-format opcodes, BSR and BR. 


31 26 25 2120 0 


Compilers should use the first one for subroutine calls, and the second for GOTOs. 
Some implementations may push a stack of predicted return addresses for BSR 

- and not push the stack for BR. Failure to compile the correct opcode will result 
in mispredicted return addresses, and hence make subroutine returns slow. 









Branch Format 


Branch Format 


5. The memory-format JSR instruction has 16 unused bits. These should be used 
by the compilers to communicate a hint about expected branch-target behavior 
(see Common Architecture, Chapter 4): 





31 | 16 15 0 


Memory Format 


If the JSR is used for a computed GOTO or a CASE statement, compile bits 

_ <15:14> as 00, and bits <13:0> such that (updated PC+Instr<13:0>*4) <15:0> 

- equals (likely_target_addr) <15:0>. In other words, pick the low 14 bits so that 

a normal PC+displacement*4 calculation will match the low 16 bits of the most 

likely target longword address. (Implementations will likely prefetch from the 
matching cache block.) 


If the JSR is used for a computed subroutine call, compile bits <15:14> as 01, 
and bits <13:0> as above. Some implementations will prefetch the call target 
using the prediction and also push updated PC on a return-prediction stack. 


If the JSR is used as a subroutine return, compile bits <15:14> as 10. Some 
implementations will pop an address off a return-prediction stack. 


If the JSR is used as a coroutine linkage, compile bits <15:14> as 11. Some 
implementations will pop an address off a return-prediction stack and also push 
updated PC on the return-prediction stack. 


Implementors should give first priority to executing straight-line code with no 
branch-takens as quickly as possible, second priority to predicting conditional 
branches based on the sign of the displacement field (backward taken, forward not- 
taken), and third priority to predicting subroutine return addresses by running a 
small prediction stack. (VAX traces show a stack of 2 to 4 entries correctly predict 
most branches.) | 
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_ A.2.4 Improving |-Stream Density — Factor of 3 


Compilers should try to use profiles to make sure almost 100 percent of the bytes 
brought into an I-cache are actually executed. This means aligning branch targets 
and putting rarely executed code out of line. Doing so would consistently make an 
I-cache appear about two times larger, compared to current VAX practice. 


The example below shows the bytes actually brought into a VAX cache (from part of 
an address trace of a DLINPAC). The dots represent bytes brought into the cache 
but never executed. They occupy about half of the cache. 


Each line shows the use of an aligned 64-byte I-cache block. A portion of DLINPAC 
and a portion of OpenVMS 4.x are shown. Uppercase I is the first byte of an 
instruction, and lowercase i marks subsequent bytes. Period (.) shows a byte 
brought into the cache but never executed. 


I-fetch Byte 0 | Byte 63 


OOOZ6SCO: o o.ei6 oe eb ele elie eos ws eele eee S TiLLTALIAiTiliiliiii liar... ccececccesece 
00026900 .cc neces cesncvcene Oe eee emcees ase ee eeseeneseces ts Eps On Ds HN is Us sh Be Oe Es BB 
00026940 LiliililiiliTililiiiLiliiIiTiiiiiitiTiiTiii. .. cw cee wee e ween ncn 
00026980 ....... Serres leverarie ee eee ee ee ee ee ee TispTiitisatiiisifs isis. 
OO0Z69CO Tissscevrcewesue TiLGLATLLTLiLTTiliiiiliiilIililiililiiililiii..... 
‘el e4-) \0] (er ee a a i a a Be Ws Ws Op Bs a is ss eB Bo Us Os oe 
OO02Z GAS e..6 sie oe wie iets ww Bie er are, Me ie-9 6 Se ecw Hee Tiii2111111T1111111iTiliiiliilii 
OOOZ6A80 Tiliiiilililiiililililiiiiiiiiliiliiiliii.........2.2-.2-. Tiilii 
OOOZGACO Tia lTidd.wccwccccccc cc vccccccnescccacncsevccsscecneseveeseesesece 
BOO044ED Ln cece ee creer cee s creer vccsceccescnreveccees He Feo He OO OG Be Bs ee eee 
S0004C80 4. TEVI i145 ine fe ta eee sa toes we dechew cabin tee eee ee ee SSeS 
B0004900 2. ccc ee nncvacves Tii Ti i Tiiliiiililiiliiliiliiililiiiililiiiliiiil 
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SOOO SAOO® ca. scieisiw oie are ed ewe SS Sb ews SEO Wale Re Wie BAe, Soe eoes Se ae Tiiiiiliiliiiii 
SOOO4A40 TiliiliiiiliiiliiilIiiiliii.........06. Tiiiiilliiiiiliiiiliiliiil 
SOO04ZA8O TiiiilTiiiliiliiliii....TiLILiTiir. ... cere www cc rc nen cnccvvveveces 
BOOOSE SO: cee cise el ee ew ie Sere OS Wate eS TASS ATTA iii Lild icc 
BOOO4ZFBO . ccc wc cree ns vcevcseces TiiiliiiiiiilTiililiiiliiiiiiiiiiiiiiliiil 
B80004FCO ITiiiiiliiililiiiliii..... i i Ef Oe oT) EB Ds. Ree ee were ee ee 
SOOQOSASO ions ooo ee Os W666 CLES EN Be 8S wee 6 Ore CRS ee 6 eS we als © Ses Tiiiliii 


SOOO8A80 IliiliiiliililiiilIililiiililiiliiiiiliiliiliiliiiiiiililiiiliii. 
A.2.5 Instruction Scheduling — Factor of 3 


The performance of Alpha programs will be sensitive to how carefully the code is 
scheduled to minimize instruction-issue delays. 


“Result latency” is defined as the number of CPU cycles that must elapse between an 
instruction that writes a result register and one that uses that register, if execution- — 
time stalls are to be avoided. Thus, a latency of zero means that the instruction 
writes a result register and the instruction that uses that register can be multiple- 
issued in the same cycle. A latency of 2 means that if the writing instruction is issued 
at cycle N, the reading instruction can issue no earlier than cycle N+2. Latency is 
implementation-specific. _ | 7 


Most Alpha instructions have a non-zero result latency. Compilers should schedule 
code so that a result is not used too soon, at least in frequently executed code (inner 
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loops, as identified by execution profiles). In general, this will require loop unrolling 
and short procedure inlining. 


“Too soon” is currently ill-defined, since no implementations have been degioued yet. 
For starters, assume that implementations can dual-issue instructions. Assume 
that Load and JSR instructions have a latency of 3, shifts and byte manipulation a 
latency of 2, integer multiply a latency of 10, and other integer operates a latency 
of 1. Assume floating multiply has a latency of 5, floating divide a latency of 10, 
and other floating operates a latency of 4. Scheduling to these latencies will give 
at least reasonable performance on currently anticipated implementations. \ More 
precise tables will be supplied in later versions of this ApPenGne,. as the information 
becomes available. \ 


Compilers should try to schedule ese to match the above deat rules and also to 
match the multiple-issue rules. If doing both is impractical for a particular sequence 
of code, the latency rules are more important (since they apply even in single-issue 
implementations). 


Implementors should give first priority to minimizing the irene of back-to-back 
integer operations, of address calculations immediately followed by load/store, of load 
immediately followed by branch, and of compare immediately followed by branch. 
Second priority should be given to minimizing latencies in general. 


A.3 Data-Stream Considerations 
The following sections describe considerations for the data stream. 


A.3.1 Data Alignment — Factor of 10 


Data PSECTs should be at least octaword-aligned, so that aggregates (arrays, some 
records, subroutine stack frames) can be allocated on aligned octaword boundaries 
to take advantage of any implementations with aligned octaword data paths, and to 
decrease the number of cache fills in almost all implementations. 


Aggregates (arrays, records, common blocks, and so forth) should be allocated on 
at least aligned octaword boundaries whenever language rules allow this. In some 
implementations, a series of writes that completely fill a cache block may be a factor 
of 10 faster than a series of writes that partially fill a cache block, when that cache 
block would give a read miss. This is true of writeback caches that read a partially 
filed cache block from memory, but optimize away the read for completely filled 
blocks. 


For such implementations, long strings of sequential writes will be faster if they 
start on a cache-block boundary (a multiple of 128 bytes will do well for most, if not 
all, Alpha implementations). This applies to array results that sweep through large 
portions of memory, and also to register-save areas for context switching, graphics 
frame buffer accesses, and other places where exactly 8, 16, 32, or more quadwords 
are stored sequentially. Allocating the targets at multiples of 8, 16, 32, or more 
quadwords, respectively, and ee the writes in order of i anér easing address will 
maximize the write speed. 
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Items within aggregates that are forced to be unaligned (records, common blocks) 

_ should generate compile-time warning messages and inline byte extract/insert code. 
Users must be educated that the warning message means that they are taking a 
factor of 30 performance hit. 


Compilers should consider supplying a switch that allows the compiler to pad 
aggregates to avoid unaligned data. 


Compiled code for parameters should assume that the parameters are aligned. 
Unaligned actuals will therefore cause runtime alignment traps and very slow 
fixups. The fixup routine, if invoked, should generate warning messages to the 
user, preferably giving the first few statement numbers that are doing unaligned 
parameter access, and at the end of a run the total number of alignment traps (and 
perhaps an estimate of the performance improvement if the data were aligned). 
Again, users must be educated that the trap routine warning message means they 
are taking a factor of 30 performance hit. 


Frequently used scalars should reside in registers. Each scalar datum allocated 
in memory should normally be allocated an aligned quadword to itself, even if the 
datum is only a byte wide. This allows aligned quadword loads and stores and avoids 
partial-quadword writes (which may be half as fast as full-quadword writes, due to 
such factors as read-modify-write a quadword to do quadword ECC calculation). 


Implementors should give first priority to fast reads of aligned octawords and second 
priority to fast writes of full cache blocks. Partial-quadword writes need not have a 
- fast repetition rate. 


A.3.2 Shared Data in Multiple Processors — Factor of 3 


Software locks are aligned quadwords and should be allocated to large cache blocks 
that either contain no other data, or read-mostly data whose usage is correlated with 
the lock. | 


Whenever there is high contention for a lock, one processor will have the lock and 
be using the guarded data, while other processors will be in a read-only spin loop on 
the lock bit. Under these circumstances, any write to the cache block containing the 
lock will likely cause excess bus traffic and cache fills, thus having a performance 
impact on all processors that are involved, and the buses between them. In some 
decomposed FORTRAN programs, refills of the cache blocks containing one or two 
frequently used locks can account for a third of all the bus bandwidth the program 
consumes. | 


Whenever there is almost no contention for a lock, one processor will have the lock 
and be using the guarded data. Under these circumstances, it might be desirable to 
keep the guarded data in the same cache block as the lock. 


For the high sharing case, compilers should assume that almost all accesses to 
shared data result in cache misses all the way back to main memory, for each distinct 
cache block used. Such accesses will likely be a factor of 30 slower than cache hits. 
It is helpful to pack correlated shared data into a small number of cache blocks. It is 
helpful also to segregate blocks written by one processor from blocks read by others. 
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Therefore, accesses to shared data, including locks, should be minimized. For 
example, a 4-processor decomposition of some manipulation of a 1000-row array 
should avoid accessing lock variables oy row, but instead might access a lock 
variable every 250 rows. 


Array manipulation should be partitioned across processors so that cache blocks do 
not thrash between processors. Having each of 4 processors work on every fourth 
array element severely impairs performance on any implementation with a cache 
block of 4 elements or larger. The processors all contend for copies of the same cache 
blocks and use only 1/4 of the data in each block. Writes in one processor severely 
impair cache performance on all processors. 


A better decomposition is to give each processor the largest possible contiguous. 

chunk of data to work on (N/4 consecutive rows for 4 processors and row-major 

array storage; N/4 columns for column-major storage). With the possible exception 

of 3 cache blocks at the partition boundaries, this decomposition will result in each 
- processor caching data that is touched by no other processor. | 


Operating-system scheduling algorithms should attempt to minimize process 
migration from one processor to another. Any time migration occurs, there are likely 
to be a large number of cache misses on the new processor. 


Similarly, operating-system scheduling algorithms should attempt to enforce some 

affinity between a given device’s interrupts and the processor on which the interrupt- 

handler runs. I/O control data structures and locks for different devices should be 

disjoint. Doing both of these allows higher cache hit rates on the corresponding I/O 
control data structures. 


Implementors should give first priority to an efficient (low-bandwidth) way of 
transferring isolated lock values and other isolated, shared write data between 
processors. 


Implementors should assume that the amount of shared data will continue to 
increase, so over time the need for efficient sharing implementations will also 
increase. 


A.3.3 Avoiding Cache/TB Conflicts — Factor of 1 


Occasionally, programs that run with a direct-mapped cache or TB will thrash, 
taking excessive cache or TB misses. With some work, thrashing can be minimized 
at compile time. 


In a frequently executed loop, compilers could allocate the data items accessed from 
memory so that, on each loop iteration, all of the memory addresses accessed are 
either in exactly the same aligned 64-byte block, or differ in bits VA<10:6>. For loops 
that go through arrays in a common direction with a common stride, this means 
allocating the arrays, checking that the first-iteration addresses differ, and if not, 
inserting up to 64 bytes of padding between the arrays. This rule will avoid thrashing 
in small direct-mapped data caches with block sizes up to 64 bytes and total sizes 
of 2K bytes or more. 3 


Ixxample: — 
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REAL*4 A(1000),B (1000) 
DO 60 i=1,1000 
60 A(i) = £(B(i)) 


BAD allocation (A and B thrash in 8 KB direct-mapped cache): 





0 4K 8K : 12K 16K 


BETTER allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of 
B can be in cache simultaneously): 





0 4K 8K+64 12K 16K 


BEST allocation (A and B offset by 64 mod 2 KB, so 16 elements of A and 16 of B 
can be in cache simultaneously, and both arrays fit entirely in 8 KB or bigger cache): 





0 4K-64 8K | 12K 16K 


In a frequently executed loop, compilers could allocate the data items accessed from 
memory so that, on each loop iteration, all of the memory addresses accessed are 
either in exactly the same 8 KB page, or differ in bits VA<17:13>. For loops that go 
through arrays in a common direction with a common stride, this means allocating 
the arrays, checking that the first-iteration addresses differ, and if not, inserting 
up to 8K bytes of padding between the arrays. This rule will avoid thrashing in 
direct-mapped TBs and in some large direct-mapped data caches, with total sizes of 
32 pages (256 KB) or more. 


Usually, this padding will mean zero extra bytes in the executable image, just a skip 
in virtual address space to the next-higher page boundary. 


For large caches, the rule above should be applied to the I-stream, in addition to 
all the D-stream references. Some implementations will have combined I-stream 
/D-stream large caches. 


Both of the rules above can be satisfied simultaneously, thus often eliminating 
thrashing in all anticipated direct-mapped cache/TB implementations. 
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A. 3. 4 Sequential Read/Write — Factor of 1. 


All other things being equal, sequences of consecutive reads or ctes should use 
ascending (rather than descending) memory addresses. Where possible, the memory 
address for a block of 2**Kbytes should be on a 2**K boundary, since this minimizes 
the number of different cache blocks used and minimizes the number of partially 
written cache blocks. 


To avoid overrunning memory bandwidth, sequences of more than eight quater 
Loads or Stores should be broken up with intervening instructions (if there is any 
useful work to be done). 


For consecutive reads, implementors should give first priority to prefetching 
ascending cache blocks, and second priority to absorbing up to eight consecutive 
quadword Loads (aligned on a 64-byte boundary) without stalling. 


For consecutive writes, implementors should give first priority to avoiding read 
overhead for fully written aligned cache blocks, and second priority to absorbing © 
up to eight consecutive quadword Stores (aligned on a 64-byte boundary) without 
stalling. | 


A.3.5 Prefetching — Factor of 3 


To use FETCH and FETCH_M effectively, software should follow this programming 
~ model: 


1. Assume that at most two FETCH instructions can be outstanding at once, 
and that there are two prefetch address registers, PREa and PREb, to hold 
prefetching state. FETCH instructions alternate between loading PREa and 
PREb. Each FETCH instruction overwrites any previous prefetching state, thus 
terminating any previous prefetch that is still in progress in the register that is 
loaded. The order of fetching within a block and the order between PREa and 
PREb are UNPREDICTABLE. 


IMPLEMENTATION NOTE 
Implementations are encouraged to alternate at 
convenient intervals between PREa and PREb. 


2. Assume, for maximum efficiency, that there should be about 64 unrelated memory 
access instructions (load or store) between a FETCH and the first actual data 
access to the prefetched data. 


3. Assume, for instruction-scheduling purposes in a multilevel cache hierarchy, that 
FETCH does not prefetch data to the innermost cache level, but rather one level 
out. Schedule loads to bury the last level of misses. 


4. Assume that FETCH is worthwhile if, on average, at least half the data in a 
block will be accessed. Assume that FETCH_M is worthwhile if, on average, at 
least half the data in a block will be modified. 


5. Treat FETCH as a vector load. If a piece of code could usefully prefetch 4 
operands, launch the first two prefetches, do about 128 memory references 
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worth of work, then launch the next two prefetches, do about 128 more memory 
references worth of work, then start using the 4 sets of prefetched data. 


6. Treat FETCH as having the same effect on a cache as a series of 64 quadword 
loads. If the loads would displace useful data, so will FETCH. If two sets of loads 
from specific addresses will thrash in a ee cache, so will two FETCH 
instructions using the same pair of addresses. 


IMPLEMENTATION NOTE 
Hardware implementations are expected to provide 
either no support for FETCHx or support that closely © 
matches this model. 


A.4 Code Sequences 
The following section describes code sequences. 


A.4.1 Aligned Byte/Word Memory Accesses 


The instruction sequences given in Common Architecture, Chapter 4 for byte and 
word accesses are worst-case code. In the common case of accessing a byte or aligned — 
word field at a known offset from a pointer that is expected to be at least longword 
aligned, the common-case code is much shorter. 


“Expected” means that the code should run fast for a longword-aligned pointer and 
trap for unaligned. The trap handler may at its option fix up the unaligned reference. 


For access at a known offset D from a longword-aligned pointer Rx, let D.lw be D 
rounded down to a multiple of 4 ((D div 4)*4), and let D.mod be D mod 4. 


In the common case, the intended sequence for loading and zero-extending an penance 


word is: 
LDL R1,D.1w (Rx) ! Traps if unaligned 
EXTWL R1,#D.mod, Rl ! Picks up word at byte 0 or byte 2 
In the common case, the intended sequence for loading and sign-extending an aligned 
word is: 
LDL R1,D.1lw (Rx) ! Traps if unaligned 
SLL R1, #48-8*D.mod,R1 ! Aligns word at high end of R1l 
SRA R1,#48,R1 ! SEXT to low end of Rl 
NOTE 


The shifts often can be combined with shifts that 
might surround subsequent arithmetic operations (for 
example, to produce word overflow from the high end of 
a register). 
In the common case, the intended sequence for loading and zero-extending a byte is: 


LDL R1,D.1w (Rx) ! 
EXTBL R1,#D.mod,R1 ! 
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In the common case, the intended sequence for loading and sign-extending a byte is: 


LDL R1,D.1w (Rx) 
SLL | R1,#56-8*D.mod,R1 ! 
SRA -R1, #56,R1 


In the common case, the intended sequence for storing an aligned word R85 is: 


LDL R1,D.1lw (Rx) ! 
INSWL R5,#D.mod, R3 ! 
MSKWL R1,#D.mod,Rl ie 
BIS R3,R1,R1 ! 
STL  R1,D.1w (Rx) ! 


In the common case, the intended sequence for storing a byte R85 is: 


LDL R1,D.1lw(Rx) 
INSBL R5,#D.mod,R3 ! 
MSKBL R1,#D.mod,R1 
BIS ~R3,R1,R1 ! 
STL R1,D.1w(Rx) 


A.4.2 Division 


In all implementations, floating-point division is likely to have a substantially longer 
result latency than floating-point multiply; in addition, in many implementations 
multiplies will be pipelined and divides will not. 


Thus, any division by a constant power of two should be compiled as a multiply 
by the exact reciprocal, if it is representable without overflow or underflow. If 
language rules or surrounding context allow, other divisions by constants can be 
closely approximated via multiplication by the reciprocal. 


Integer division does not exist as a hardware opcode. Division by a constant can 
always be done via UMULH of another appropriate constant, followed by a right 
shift. General quadword division by true variables can be done via a subroutine. 
The subroutine could test for small divisors (less than about 1000 in absolute value) 
and for those, do a table lookup on the exact constant and shift count for an UMULH © 
/shift sequence. For the remaining cases, a table lookup on about a 1000-entry 
table and a multiply can give a linear approximation to 1/divisor that is accurate to 
16 bits. Using this approximation, a multiply and a back-multiply and a subtract 
can generate one 16-bit quotient “digit” plus a 48-bit new partial dividend. Three 
more such steps can generate the full quotient. Having prior knowledge of the 
possible sizes of the divisor and dividend, normalizing away leading bytes of zeros, 
and performing an early-out test can reduce the average number of multiplies to 
about 5 (compared to a best case of 1 and a worst case of 9). 


A.4.3 Stylized Code Forms 


Using the same stylized code form for a common operation makes compiler output 
a little more readable and makes it more likely that an implementation will speed 
up the stylized form. 
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A.4.3.1 NOP 
The standard NOP forms are: 


NOP == BIS R31,R31,R31 
FNOP == CPYS F31,F31,F31 


These generate no exceptions. In most implementations, they should encounter no 
operand issue delays, no destination issue delay, and no functional unit issue delay. 
Implementations are free to optimize these into no action and zero execution cycles. 


A.4.3.2 Clear a Register 
The standard clear register forms are: 


CLR == BIS R31,R31, Rx 
FCLR == CPYS F31,F31,Fx 


These generate no exceptions. In most implementations, they should encounter no 
operand issue delays, and no functional unit issue delay. 


A.4.3.3 Load Literal 
The standard load integer literal (ZEXT 8-bit) form is: 
MOV #11t8,Ry == BIS R31, 11it8, Ry 


The Alpha literal construct in Operate instructions creates a canonical longword 
constant for values 0..255. 


A longword constant stored in an Alpha 64-bit register is in canonical form when 
bits <63:32>=bit <31>. 


A canonical 32-bit literal can usually be generated with one or two instructions, but 
sometimes three instructions are needed. Use the following procedure to determine 
the offset fields of the instructions: 7 


val = <sign-extended, 32-bit value> 

low = val<15:0> 

tmp1l = val - SEXT (low) ! Account for LDA instruction 
high tmp1<31:16> 


tmp2 = tmpl - SHIFT LEFT( SEXT(high,16) ) 


if tmp2 NE 0 then 
! original val was in range 7FFF800016..7FFFFFFF16 
extra = 400016 | 


tmpl = tmpl —- 4000000046 | 
high = tmp1<31:16> 
else 
extra = 0 
endif 


The general sequence is: 


LDA Rdst, low(R31) 
LDAH Rdst, extra(Rdst) ! Omit if extra=0 
LDAH Rdst, high (Rdst) ! Omit if high=0 
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A.4.3.4 Register-to-Register Move 
The standard register move forms are: 


- MOV RX,RY == BIS RX,RX,RY 
FMOV FX,FY == CPYS FX,FX,FY 


These generate no exceptions. In most implementations, these should encounter no 
functional unit issue delay. 


A.4.3.5 Negate 


The standard register negate forms are: 


NEGz Rx,Ry = SUBz R31, Rx, Ry 'zg= Lor Q 
NEGz Fx,Fy == SUBz F31,Fx,Fy !2z2=F GSort’tT 
FNEGzZ Fx, Fy == CPYSN Fx, Fx, Fy !z=F GS orwt 


The integer subtract generates no Integer Overflow trap if Rx contains the largest 
negative number (SUBz/V would trap). The floating subtract generates a floating- 
point exception for a non-finite value in Fx. The CPYSN form generates no 
exceptions. 


A.4.3.6 NOT 
The standard integer register NOT form is: 
NOT Rx,Ry meas ORNOT R31,Rx,Ry 


This generates no exceptions. In most implementations, this should encounter no 
functional unit issue delay. — 


A.4.3.7 Booleans 
The standard alternative to BIS is: 


OR Rx,Ry,Rz == BIS Rx,Ry,Rz 
The standard alternative to BIC is: | 

ANDNOT Rx,Ry,Rz == BIC Rx, Ry, Rz 
The standard alternative to EQV is: | 

XORNOT Rx,Ry,Rz == EQV Rx, Ry, Rz 


A.4.4 Trap Barrier 


The TRAPB instruction Baaeintees that following instructions do not issue until all 


possible preceding traps have been signaled. This does not mean that all preceding: 


Instructions have necessarily run to completion (for example, a Load instruction may 
_ have passed all the fault checks but not yet delivered data from a cache miss). 


A.4.5 Pseudo-Operations (Stylized Code Forms) 


This section summarizes the pseudo-operations for the Alpha architecture that may 
be used by various software components in an Alpha eee Most of these forms 
are discussed in preceding sections. 
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In the context of this section, pseudo-operations all represent a single underlying 
machine instruction. Each pseudo-operation represents a particular instruction 
with either replicated fields (such as FMOV), or hard-coded zero fields. Since the 
pattern is distinct, these pseudo-operations can be decoded by instruction decode 
mechanisms. | 


In Table A-1, the pseudo-operation codes can be viewed as macros with parameters. — 
The formal form is listed in the left column, and the expansion in the code stream 
listed in the right column. 


Some instruction mnemonics have synonyms. These are different from pseudo- 
operations in that each synonym represents the same underlying instruction with 
no special encoding of operand fields. As a result, synonyms cannot be distinquished 
from each other. They are not listed in the table that follows. Examples of synonyms 
are: BIC/ANDNOT, BIS/OR, and EQV/XORNOT. 


Table A-1: Decodable Pseudo-Operations (Stylized Code Forms) 


Pseudo-Operation in Listing Actual Instruction Encoding 
No-exception generic floating absolute 

value: | 

FABS Fx, Fy CPYS F31, Fx, Fy 
Branch to target (21-bit signed displace- 

ment): 

BR target BR R31, target 
Clear integer register: 

CLR Rx BIS R31, R31, Rx 
Clear a floating-point register: 

FCLR Fx CPYS F31, F31, Fx 
Floating-point move: 

FMOV Fx, Fy ; CPYS Fx, Fx, Fy 
No-exception generic floating negation: 

FNEG Fx, Fy CPYSN Fx, Fx, Fy 
Floating-point no-op: 

FNOP CPYS F31, F31, F31 
Move Rx/8-bit zero-extended literal to 

Ry: 

MOV {Rx/Lit8}, Ry BIS R31, {Rx/Lit8}, Ry 
Move 16-bit sign-extended literal to 

Rx: 

MOV Lit, Rx LDA Rx, lit(R31) 
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_ Pseudo-Operation in Listing Actual Instruction Encoding 
~Move to FPCR: | . _ 8 | ; 
MT_FPCR Fx MT_FPCR Fx, Fx, Fx 
Move from FPCR: ) 
MF_FPCR Fx | MF_FPCR Fx, Fx, Fx 
Negate F_floating: | | 
NEGF Fx,Fy — _  SUBF F31, Fx, Fy 
Negate F_floating, semi-precise: | 
NEGF/S Fx, Fy | SUBF/S F31, Fx, Fy — 
Negate G_floating: | | 
NEGG Fx, Fy SUBG F31, Fx, Fy 
Negate G_floating, semi-precise: 
NEGG/S_s*FFx, Fy SUBG/S F31, Fx, Fy 
Negate longword: 
NEGL {Rx/Lit8}, Ry SUBL R31, {Rx/Lit}, Ry 
Negate longword with overflow detec- 
tion: | ; 
NEGL/V {Rx/Lit8}, Ry SUBL/V R31, {Rx/Lit}, Ry 
Negate quadword: 
NEGQ {Rx/Lit8}, Ry SUBQ R31, {Rx/Lit}, Ry 
Negate quadword with overflow detec- 
tion: | 
NEGQ/V _ {Rx/Lit8}, Ry _ SUBQ/V R31, {Rx/Lit}, Ry 
Negate S_floating: 
NEGS Fx, Fy SUBS F31, Fx, Fy 
Negate S_floating, software with un- 
derflow detection: 
~NEGS/SU Fx, Fy _SUBS/SU | F931, Fx, Fy 


Negate S_floating, software with un- 
derflow and inexact result detection: 3 
NEGS/SUI Fx, Fy SUBS/SUI  F31, Fx, Fy 


Negate T_floating: | | 
NEGT Fx, Fy SUBT F31, Fx, Fy 
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Table A—1 (Cont.) 


.): Decodabie Pseudo-Operations (Stylized Code Forms) 
Actual Instruction Encoding | 


Pseudo-Operation in Listing 


Negate T_floating, software with un- . 


derfiow detection: 
NEGT/SU | Fx, Fy 


Negate T_floating, software with un- 
derfiow and inexact result detection: 
NEGT/SUI 


Integer no-op: 
NOP 


Logical NOT of Rx/8-bit zero-extended 
literal storing results in Ry: 
NOT {Rx/Lit8}, Ry 


Longword sign-extension of Rx storing 


results in Ry: 
SEXTL {Rx/Lit8}, Ry 


SUBT/SU 


SUBT/SUI 


BIS 


ORNOT 


ADDL 


F31, Fx, Fy | 


F31, Fx, Fy 


R31, R31, R31 


R31, {Rx/Lit}, Ry 


R31, {Rx/Lit}, Ry 


A.5 Timing Considerations: Atomic Sequences 


A sufficiently long instruction sequence between LDx_L and STx_C will never 
complete, because periodic timer interrupts will always occur before the sequence 
completes. The following rules describe sequences that will eventually complete in 


all Alpha implementations: 


1. At most 40 operate or safaitiowal bean (not taken) instructions executed in the 
sequence between LDx_L and STx_C. 


2. At most two I-stream TB-miss faults. 


guarantees this. 


Sequential instruction execution 


3. No other exceptions triggered during the last execution of the sequence. 


IMPLEMENTATION NOTE 
On all expected implementations, this allows for about 
50 ysec of execution time, even with 100 percent cache 
misses. This should satisfy any requirement for a 1 msec 
timer interrupt rate. 
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A.6 \REVISION HISTORY 
Revision 5.0, May 12, 1992 


1. 


Sr te Ge 


6. 


Changed cache block sizes 


Changed DRAINT to TRAPB 
Converted to SDML 


Changed MOVQ to MOV for standard load 16 bit literal 


Changed NEGS and NEGT instruction qualifiers to match SUBS and SUBT 
qualifiers | 


Modified text describing creation of canonical longword constants 


Revision 4.0, August 21, 1991 


1. 
Z. 
3. 


5. 
6. 


Added Pseudo-op table 


Typos 
Change text describing JSR to indicate that PC+displacement*4 calculation will 


- produce the low 16 bits of most likely LW target address 


Change name of N EGz form that operates on F, D, G, S, or T floating types to 
FNEGz | | 


Correct Load Literal code form description of sign-extended 32 bit load. 
Added floating point data format types to Negate’ section 


Revision 3.0, March 2, 1990 


1, 
2. 


Add section on prefetch instructions 


Minor cleanups to match opcodes in rest of document 


Revision 2.0, October 4, 1989 


1. 
2. 
3. 


Renumber RO as R31, FO as F31 


Show new byte inserts 


‘Change Freeze-Thaw to LDQ/L-STQ/C 


Revision 1.0, May 23, 1989 


1. 


Reorder and add hardware implementation priorities 


2. Add aligned byte/word section 
3. 
4 


Add stylized code form section — 


. Add timing considerations section 
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| Revision 0.0, March 15, 1989 


1. Initial version | 
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Appendix B 
IEEE Floating- -Point Conformance 


A subset of IKEE Standard for Binary Floating-Point Arithmetic (754-1985) is 
provided in the Alpha floating-point instructions. This appendix describes how to 
construct a complete IEEE implementation. 


The order of presentation parallels the order of the IEEE specification. 


B.1 Alpha Choices for IEEE Options 


Alpha supports IEEE single and double formats. Optional extended double is not 
supported. 


Alpha hardware supports normal and chopped IEEE rounding ee IEEE plus 
infinity and minus infinity rounding modes can be implemented in hardware or 
software. | 


Alpha hardware does not support optional IEEE software trap enable/disable modes; 
see the following discussion about software support. 


Alpha hardware supports add, subtract, multiply, divide, convert between floating 
formats, convert between floating and integer formats, and compare. Software 
routines support square root, remainder, round to integer in floating-point format, 
and convert binary to/from decimal. 


In the Alpha architecture, copying without change of format is not considered an 
operation. (LDx, CPYSx, and STx do not check for non-finite numbers; an operation 
would.) Compilers may generate ADDx F31,Fx,Fy to get the opposite effect. 


Optional operations for differing formats are not provided. 


The Alpha choice is that the accuracy provided will meet or exceed IEEE standard 
requirements. It is implementation-dependent whether the software binary/decimal 
conversions beyond 9 or 17 digits treat any excess digits as zeros. 


Overfiow and underflow, NaNs, and infinities encountered during software binary to 
decimal conversion return strings that specify the conditions. Such strings can be 
truncated to their shortest unambiguous length. 


Alpha hardware supports comparisons of same-format numbers. Software supports 
comparisons of different-format numbers. 


In the Alpha architecture, results are true-false in response to a predicate. 


Alpha hardware supports the required six predicates and the optional unordered 
predicate. The other 19 optional predicates can be constructed from sequences of 
two comparisons and two branches. 
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Alpha hardware supports infinity arithmetic only by trapping when an infinity 
operand is encountered and when an infinity is to be created from finite operands 


by overflow or division by zero. A software trap handler (interposed between the — : | 


hardware and the IEEE user) provides correct infinity arithmetic. 


Alpha hardware supports NaNs only by trapping when a NaN operand is 
encountered and when a NaN is to be created. A software trap handler (interposed 
between the hardware and the IEEE user) provides correct plgnaling and net NaN 
behavior. 


In the Alpha architecture, Quiet NaNs do not afford palaepece ve diagnostic 
information. 


In the Alpha spciiscuine: copying a Signaling NaN without a change of format does — 
not signal an invalid exception (LDx, CPYSx, and STx do not check for non-finite 
numbers). Compilers may generate ADDx F31,Fx,Fy to get the opposite effect. 


Alpha hardware fully supports negative zero operands, and follows the IEEE rules 
for creating negative zero results. 


Alpha hardware does not supply IEEE exception trap behavior; the hardware traps 
are a superset of the IEEE-required conditions. A software trap handler (interposed 
between the hardware and the IEEE user) provides correct IEEE exception behavior. 


In the Alpha architecture, tininess is detected by hardware after rounding, and loss 
of accuracy is detected by software as an inexact result. 


In the Alpha architecture, user trap handlers will be supported by compilers and 
a software trap handler (interposed between the hardware and the IEEE user), as 
described in the next section. 


B.2 Alpha Hardware Support of Software Exception Handlers: 


In Alpha instructions, hardware trap behavior is determined only at compile time; 
short of recompiling, there are no dynamic facilities for changing hardware trap 
behavior. | 


There is an essential disparity between the Alpha design goal of fast execution and 
the IEEE design goal of exact trap behavior. The Alpha hardware architecture 
provides means for users to choose various degrees of IEEE compliance, at 
appropriate performance cost. 


Instructions compiled without the /Software modifier cannot produce IEEE- 

compliant trap behavior, nor can they provide IEEE-compliant non-finite arithmetic. 

Trapping and stopping on non-finite operands or results (rather than the IEEE 

default of continuing with NaNs propagated) is an Alpha value-added benavicr that 
some users prefer. 


Instructions compiled without the /Underflow hardware trap enable modifier cannot 
produce IEEE-compliant underflow trap behavior, nor can they provide IEEE- 
compliant denormal results. They are fast and provide true zero (not minus zero) 
results whenever underfiow occurs. This is an Alpha value-added behavior that 
some users prefer. | | 
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Instructions compiled without the /Inexact hardware trap enable modifier cannot 
produce IEEE-compliant inexact trap behavior. Trapping on Inexact will be painfully 
slow; few users appear to prefer this, but they can get it if they really want it. 


IEEE floating-point instructions compiled with the /Software modifier produce 
hardware traps and unpredictable values; a software trap handler may then produce 
all IEEE-required behavior. 


TEEE floating-point instructions compiled with the /Underflow enable modifier 
produce hardware traps and true zero values for underflow; a software trap handler 
may then produce all IEEE-required behavior. 


IEEE floating-point instructions compiled with the /Inexact enable modifier produce 
hardware traps that allow a software trap handler to produce all IKEE-required 
behavior. | 


Thus, to get full IEEE compliance of all the required features of the standard, users 
must compile with all three options enabled. 


To get the optional full IEEE user trap handler behavior, a software trap handler 
_ must be provided that implements the five exception flags, dynamic user trap handler 

disabling, handler saving and restoring, default behavior for disabled user trap 

handlers, and linkages that allow a user handler to return a substitute result. 


Also, users must insert a TRAPB in every basic block with a floating operation that 
can potentially trap, so that a software handler has an opportunity to scale the true 
result by 2**192 or 2**1536, as appropriate for enabled user trap handlers; and to 
supply the default +/— infinity, +/-MAX, +/—-MIN, denormal, or zero as appropriate 
for disabled user trap handlers. 


-B.3 Mapping to IEEE Standard. 


There are five IEEE exceptions, each of which can be “IEEE software trap-enabled” 
or disabled (the default condition). Implementing the IEEE software trap-enabled 
mode is optional in the IEEE standard. 


Our assumption, therefore, is that the only access to IEEE-specified software trap- 
enabled results will be generated in assembly language code. The following design 
allows this, but only if such assembly language code has TRAPB instructions after 
each floating-point instruction, and generates the IEEE-specified scaled result in a 
trap handler by emulating the instruction that was trapped by hardware overflow 
faunderflow detection, using the original operands. 


There is a set of detailed IEEE-specified result values, both for operations that are 

- specified to raise IEEE traps and those that do not. This behavior is created on 
Alpha by four layers of hardware, PALcode, the operating-system trap handler, and 
the user IEEE trap handler, as shown in Figure B—1. 
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Figure B—1: IEEE Trap Handling Behavior — 


Hardware . 


Traps to PALcode 


PALcode 


Traps to Operating System _ 


Optional System 


- | Traps to User IEEE Trap Handler 
© (IEEE Standard) 


User Condition Handler 


The IEEE-specified trap behavior occurs only with respect to the user IEEE trap 
handler (the last layer in Figure B—1); any trap-and-fixup behavior in the first three 
layers is outside the scope of the IEEE standard. 


The IEEE number system is divided into finite and non-finite numbers: 
e The finites are normal numbers: _ 

—MAX..—MIN, -0, 0, +MIN..+MAX 

The non-finites are: 

Denormals, +/— Infinity, Signaling NaN, Quiet NaN 


Alpha hardware must treat minus zero operands and results as special cases, as 
required by the IEEE standard. | 


Table B—1 specifies, for the IEEE /Software modes, which layer does wack piece of 
trap handling. See Common Architecture, Chapter 4 for more detail on the hardware 
instruction poneupions. | 
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Table B—1: 


Alpha Instructions 


FBEQ FBNE FBLT FBLE FBGT 
FBGE 


LDS LDT 
STS STT 
CPYS CPYSN 
FCMOVx 


ADDx SUBx INPUT Exceptions 


Denormal operand 
+/-Inf operand 
QNaN operand 
SNaN operand 


+Inf + —Inf 


ADDx SUBx OUTPUT Exceptions 


Exponent overflow 


Exponent underflow 
and disabled 


Exponent underflow 
and enabled 


Inexact and disabled 
in the instruction 


Inexact and enabled 
in the instruction 


IEEE Floating-Point Trap Handling 


Hardware 


PAL Handler 


Os 
Trap 


Bits Only—No Exceptions 


Bits Only—No Exceptions 


Bits Only—No Exceptions 
Bits Only—No Exceptions 
Bits Only—No Exceptions 


Trap 
Trap 


Trap 


Trap 


Trap 


Supply 
+0 


Supply 
+0 and 
trap 


Trap 


Trap 


Trap 


Trap 


Supply 
sum 


Supply 
sum 


Supply 
QNaN 


Supply 
aN 


Supply 
QNaN 


Supply 
+/—Inf 


+/-MAX 


Supply 


+/—-MIN — 


denorm 
+/—0 


User 
Software 
Handler 


[Invalid Op] 


[Invalid Op] 


[Overflow] 

Scale by 

2**Alpha 
1 


[Underflow] 
Scale by 
2**Alpha 


[Inexact] 


1An implementation could onscee instead to trap to PALcode and have the PALcode supply a zero result on all 


underflows. 
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Table B~1 (Cont.): IEEE Floating-Point Trap Handling 


Alpha Instructions 
MULx INPUT Exceptions 


Denormal operand | 
+/-Inf operand 
QNaN operand 
SNaN operand 


0 * Inf 


MULx OUTPUT Exceptions 


Exponent overflow 


Exponent underflow 
and disabled 


Exponent underflow 
and enabled 


Inexact and disabled 


Inexact and enabled 


DIVx INPUT Exceptions 


Denormal operand 
+/-Inf operand 
QNaN operand 
SNaN operand 


0/0 or Inf/Inf 
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| Hardware 


Trap 


Trap 


Trap 
Trap 


Trap 


Trap 
Supply 
+0 


Supply 
+0 and 
Trap — 


Trap 


Trap 
Trap 
Trap 


Trap 


Trap 


PAL 


Trap 
Trap 


Trap 


Trap 


| Trap 


Trap 


Trap. 


Trap 


Trap 
Trap 
Trap 


Trap 


| Trap 


Os) 


| Trap 
Handler 


Supply 


prod. 


Supply 


prod. 


Supply 
QNaN 


Supply 
QNaN 


Supply 


QNaN 


Supply 


+/—Inf 
+/—~MAX 


Supply 

+/—MIN 
denorm 
+/—Q 


Supply 


quot. 


Supply 
quot. 


Supply | 


QNaN 


Supply 
QNaN 


Supply 


QNaN > 


User 
Software 


Handler 


[Invalid Op] 


{Invalid Op] 


[Overflow] 
Scale by 
2** Alpha 


[Underflow] 
Scale by 
2**Alpha | 


[Inexact] 


[Invalid Op] 


[Invalid Op] 


istribution 
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Alpha Instructions 
DIVx INPUT Exceptions 
A/O | 


DIVx OUTPUT Exceptions 


Exponent overflow 


Exponent underflow 
and disabled 


Exponent underflow 
and enabled 


_Inexact and disabled 


Inexact and enabled 


Hardware 


Trap 


CMPTEQ CMPTUN INPUT Exceptions 


Denormal operand 


QNaN operand 


SNaN operand 


Trap 


Trap 


Trap 


CMPTLT CMPTLE INPUT Exceptions — 


Denormal operand 
QNaN operand 


SNaN operand 


Trap 


Trap 


Trap 


PAL 


Trap 


Trap 


Trap 


Trap 


Trap 


Trap 
Trap 
Trap 


Trap 


Trap 


OS 
Trap 


Handler 


Supply 
+/—Inf 


Supply © 


+/—Inf 
+/—MAX 


Supply 

+/—MIN 
denorm 
+/—0 


Supply 


Supply 
False 


for EQ, True 


for UN 


Supply 
False/ 
True 


Supply 
(=) 


Supply 
False 


Supply — 
False 


User 
Software 
Handler 


[Div. Zero] 


[Overflow] 
Scale by 
2**Alpha 


[Underflow] 
Scale by 
2** Alpha 


[Inexact] 


[Invalid Op] 


[Invalid Op] 


[Invalid Op] 
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Table B—1 (Cont.): IEEE Floating-Point Trap Handling | 
| | OS User 


_ | : Trap Software 
Alpha Instructions | Hardware PAL Handler | Handler 
CVTFi INPUT Exceptions : | _— 
Denormal operand Trap | Trap Supply _ 
| Cvt | 
+/-Inf operand | Trap Trap Supply [Invalid Op] 
| — Cvt . | 
QNaN operand Trap Trap Supply — 
QNaN 
SNaN operand Trap Trap Supply | [Invalid Op] 
QNaN 
CVTFi OUTPUT Exceptions 
Inexact and disabled — | _ - _ 
Inexact and enabled Trap Trap -— [Inexact] 
Integer overflow Supply Trap — [Invalid Op]? 
| Trunc. | 
result 
and trap 
if enabled 
CVTif OUTPUT Exceptions 
Inexact and disabled = | — ~ — 
Inexact and enabled Trap Trap — [Inexact] 
CVTff INPUT Exceptions | 
Denormal operand Trap Trap Supply Es 
Cvt 
+/-Inf operand Trap Trap Supply ~ 
| Cvt 
QNaN operand ‘Trap Trap Supply _ 
QNaN 
SNaN operand Trap Trap Supply [Invalid Op] 


QNaN 
2An implementation could choose instead to trap to PALcode on extreme values and have the PALcode supply a 


truncated result on all overflows. 
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Table B—1 (Cont.): IEEE Floating-Point Trap Handling 


OS User 
Trap Software 
Alpha Instructions Hardware PAL Handler Handler 
CVT£f OUTPUT Exceptions | 
Exponent overflow Trap Trap Supply [Overflow] 
| +/—Inf Scale by 
+/-MAX 2**Alpha 
Exponent underflow Supply — - _ 
and disabled +0 
Exponent underflow Supply Trap Supply [Underfiow] 
and enabled _ +0 and +/—MIN Scale by 
trap denorm 2**Alpha 
+/—0 
Inexact and disabled = | _ _ - 
Inexact and enabled Trap Trap — [Inexact] 


Other IEEE operations (software subroutines or sequences of instructions), are listed 
here for completeness: 


Remainder 

SQRT 

Round float to integer-valued float 

Convert binary to/from decimal 

Compare, other combinations than the four above 
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Table B—2 shows the IEEE standard charts. 


Table B-2: IEEE Standard Charts 


IEEE Software IEEE Software 
TRAP Disabled TRAP Enabled 


Exception (ITEEE Default) (Optional ) 
Invalid Operation 

(1) Input signaling NaN Quiet NaN 

(2) Mag. subtract Inf. | | Quiet NaN 

(3) 0 * Inf. Quiet NaN 

(4) 0/0 or Inf/Inf Quiet NaN 

(5) x REM 0 or Inf REM y Quiet NaN 

(6) SQRT(negative non-zero) Quiet NaN 

(7) Cvt to int(ovfl, Inf, NaN) Quiet NaN 

(8) Compare unordered Quiet NaN 


Division by Zero 


x/0, x finite <>0 +/—Inf 

Overflow —- 

Round nearest +/—Inf. Res/2**192 or 1536 
Round to zero +/-MAX ~ Res/2**192 or 1536 
Round to —Inf +MAX/—Inf Res/2**192 or 1536 
Round to +Inf — +Inf/-MAX Res/2**192 or 1536 
Underflow 0/denorm/+ —-MIN Res*2**192 or 1536 | 
Inexact Rounded/ovfl Res 


IEEE software trap handler requirements are as follows: 


Result is unpredictable unless supplied by aay handler. 

Determine which exceptions occurred. 

Determine the kind of operation. 

Determine the destination format. 

Overflow/underflow/inexact: the correctly rounded result, including parts that do 
not fit in the format. 

Invalid and divzero: the operand values. 
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B.4 \REVISION HISTORY 
Revision 5.0, May 12, 1992 
1. Reconciled TBDs 
2. Changed DRAINT to TRAPB 
3. Converted to SDML 


Revision 4.0, August 21, 1990 


1. Remove input exceptions for —-0. This should have been removed in revision 3.0 


2. Typos 
3. Change IEEE user’ to ‘user IEEEP’ in section Mapping to IEEE Standard 
4 


Specified T floating point data type for CMP instructions and eliminated ’+/—Inf 
operand’ input exception from these instructions 


Revision 3.0, March 2, 1990 — 
1. Revise and simplify IEEE trap behavior 


Revision 2.0, October 4, 1989 


1. Initial version 
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Appendix C 
Instruction Encodings 


The encodings for the Alpha instruction set are given in the following sections. 
‘There is one section for each instruction format, followed by a summary of all the 
- instruction opcodes in a single table. | 


NOTE 
\ To receive a VAX Structure Definition Language (SDL) 


file defining the opcodes and function codes send mail to 
AD::Alpha_$OPCODES.\ 


C.1 Memory Format Instructions 


Table C—1 lists the hexadecimal values of the 6-bit opcode field for the Memory 
format instructions. 


Table C-1: Memory Format Instruction Opcodes 


Mnemonic Mnemonic Mnemonic 
LDA 08 LDAH 09 LDF 20 
LDG 21 LDL 28 LDL_L 2A 
LDQ 29 LDQ_L 2B a LDQ_U 0B 
LDS 22 LDT 23 STF 24 
STG 25 — §TL 2C STL_C 2E 
STQ 2D — STQ C #£=2F | STQ_U OF 
STS 26 STT — 27 | 


Table C-—2 lists the hexadecimal values of the 6-bit opcode field and the 16-bit 
displacement field for the Memory format instructions that use the displacement 
field as a function code. The notation used is oo.ffff, where oo is the 6-bit opcode and 
the ffff is the 16-bit displacement field. 


Table C—2: Memory Format Instructions with a Function Code 


Mnemonic Mnemonic Mnemonic 
FETCH 18.8000 FETCH_M 18.A000 MB 18.4000 
RC 18.E000 RPCC 18.C000 RS 18.F000 


TRAPB 18.0000 
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| PROGRAMMING NOTE 

The code points 18.4400, 18.4800, and 18.4C00 must 

- operate as Memory Barrier instructions (MB 18.4000). | 
Software will currently only use the 18.4000 code point 


for MB. This allows a weaker memory barrier to be 
added. | 


Table C-—3 lists the hexadecimal values of the high-order two bits of the displacement 
_ field for the Memory format branch instructions. The notation used is oo.h, where 
oo is the 6-bit opcode and the A is the high-order two bits of the displacement field. 


Table C-3: Memory Format Branch Instruction Opcodes 


Mnemonic Mnemonic | Mnemonic 
IMP 1A.0 JSR 1A.1 JSR_COROUTINE 14.3 
RET 1A.2 


C.2 Branch Format Instructions 


Table C—4 lists the hexadecimal values of the 6-bit opcode field for the Branch format 
instructions. 


Table C—4: Branch Format instruction Opcodes 
Mnemonic | Mnemonic 


Mnemonic 
BR 30 | FBEQ 31 FBLT 32 
FBLE 33 BSR 34 FBNE 35 
FBGE 36 FBGT 37 BLBC 38 
BEQ 39 BLT 3A | BLE — 3B 
BLBS 3C BNE 3D BGE 3E 


BGT oF 


C.3 Operate Format Instructions 


C2 


Table C—5 lists the hexadecimal values of the 6-bit opcode field and the 7-bit function 
code field for the Operate format instructions The notation used is oo.ff, where oo is 
the 6-bit opcode and the ff is the 7-bit function code field 


Table C-5: Operate Format Instruction Opcodes and Function Codes 


Mnemonic | Mnemonic | Mnemonic 

ADDL 10.00 | ADDL/V 10.40 ADDQ 10.20 

ADDQ/V 10.60 CMPBGE __10.0F CMPEQ 10.2D 
CMPLE 10.6D CMPLT 10.4D CMPULE  10.3D 
CMPULT  10.1D SUBL 10.09 SUBL/V 10.49 
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Table C—5 (Cont.): Operate Format Instruction Opcodes and Function Codes 


Mnemonic Mnemonic Mnemonic 
SUBQ 10.29 SUBQ/V 10.69 

S4ADDL 10.02 S4ADDQ 10.22 S4SUBL  10.0B 
S4SUBQ 10.2B S8ADDL 10.12 S8ADDQ 10.32 
S8SUBL 10.1B S8SUBQ 10.3B : 

AND 11.00 BIC 11.08 BIS 11.20 


CMOVEQ 11.24 
CMOVGE 11.46 


CMOVLBC 11.16 
CMOVGT 11.66 


CMOVLBS 11.14 
CMOVLE = 11.64 


CMOVLT 11.44 CMOVNE 11.26 — EQV 11.48 
ORNOT 11.28 XOR 11.40 

EXTBL 12.06 EXTLH 12.6A EXTLL 12.26 
EXTQH 12.7A EXTQL 12.36 EXTWH 12.5A 
EXTWL 12.16 INSBL 12.0B INSLH 12.67 
INSLL 12.2B INSQH 12.77 INSQL 12.3B 
INSWH 12.57 INSWL 12.1B -MSKBL 12.02 
MSKLH 12.62 MSKLL 12.22 MSKQH 12.72 
MSKQL 12.32 MSKWH = 12.52 MSKWL 12.12 
SLL 12.39 SRA 12.3C SRL 12.34 
ZAP 12.30 ZAPNOT 12.31 

MULL 13.00 -MULL/V 13.40 MULQ 13.20 
MULQ/V 13.60 UMULH 


C.4 Floating-Point Operate Format 


Table C—6 lists the hexadecimal values of the 11-bit function code field for the 
Floating-point Operate format instructions that are data type independent. The 
6-bit opcode for these instructions is 17y.¢. 


Table C-6: Function Codes for Floating Data Type Independent Operations 


Mnemonic | Mnemonic Mnemonic 
CPYS 020 CPYSE 022 CPYSN 021 
CVTLQ 010 CVTQL 030 CVTQL/SV 530 
CVTQL/V 130 


FCMOVEQ 02A 
FCMOVLE 02E 
MF_FPCR 025 


FCMOVGE 02D 
FCMOVLT 02C 
MT_FPCR 024 


FCMOVGT 02F 
FCMOVNE 02B 
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C.4.1 IEEE Floating-Point Instructions 


_ Table C-7 lists the hexadecimal value of the 11-bit function code field for the 
IEEE floating-point instructions, with and without qualifiers. The opcode for these 
instructions is 164.¢. 


Table C—7: IEEE Floating-Point Instruction Function Codes 
None /C /M. /D [U fjuc /UM /UD 


ADDS 080 000 040 0CO 180 100 140 100 
ADDT 0OAO 020 060 OEO 1A0~= 120 ~~ 160 1E0 
CMPTEQ 0A5 | 

CMPTLT 0A6 

CMPTLE 0A7 

CMPTUN 0A4 


CVTQS OBC 03C O07C OFC 

CVTQT OBE O38E O7E OFE 

CVTTS OAC 02C 06C OEC 1AC 12C 16C~ 1EC 
DIVS 0838 003 043 OC3 183 #103 143 103 
DIVT OA3 023 0638 OES 1A3— 123 ~= 163 1E3 
MULS 082 002 042 O0C2 182 102 142 #4«21C2 
MULT 0OA2 022 062 OE2Z 1A2 122 162 ~= = 1K2 
SUBS 081 001 041 #+2®°0C1~= 181 101 141 1C1 


SUBT OA1 021 061 #=OEF1 i1Al~= 121 161 1E1 


SU, /SUC /SUM /SUD /SUI /SUIC /SUIM /SUID 


ADDS 580 500 540 5CO 780 700 740 7CO0 
ADDT — 5AO 520 560 5EO0 7A0 720 760 # 7EO 
CMPTEQ 5A5 

CMPTLT 5AG 

CMPTLE BAT 

CMPTUN 5A4 7 

CVTQS 7BC 73C 77C 7FC 


CVTQT | 7BE 73E 77E = 7FE 
CVTTS 5AC 52C 56C 5EC 7AC 72C T76C TEC 
DIVS 588 503 543 5C3 783 $703 743 #703 
DIVT 5A3 5238 563 5E38 7A3 723 £.763 #7E3 
MULS 582 502 542 5C2 782 $702 742 + °1®7C2 
MULT 5A2 522 562 5E2 T7A2 722 762 # TE2 
SUBS 581 501 541 #+.5dCl 781 $701 741 #2x7C!l1 
SUBT 5Al 521 #561 5E1 7AL 721 #29761 #4z237EI 


| None /C /V WC SV SVC /SVI_ /SVIC 
CVTTQ OAF O2F 1AF 12F SAF 52F 7AF 72F 
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Table C-7 (Cont.): IEEE Floating-Point Instruction Function Codes 
D ND /SVD /SVID M NM _ /SVM /SVIM 
CVTTQ OEF 1EF 5EF EF O6F 16F 56F 76F 


PROGRAMMING NOTE 
Since underflow cannot occur for CMPT xx, there is no 
difference in function or performance between CMPTxx 
/S and CMPTxx/SU. It is intended that software 
generate CMPTxx/SU in place of CMPTxx/S. 


C.4.2 VAX Floating-Point Instructions 


Table C~8 lists the hexadecimal value of the 11-bit function code field for the VAX 
floating-point instructions. The opcode for these instructions is 154. 


Table C-8: VAX Floating-Point Instruction Function Codes 
None /C {U fjuc I isc SU SUC 


ADDF 080 000 180 100 480 400 580 #500 
CVTDG OS9E O1E I19E 11E 49E 41K 59E 51K 
ADDG 0OAO 020 1A0 120 4A0 420 5A0 = 520 
CMPGEQ 0A5 4A5 

CMPGLT 0A6 4A6 

CMPGLE 0A7 4A7 

CVTGF Q0AC 02C 1AC 120 4AC 42C 5AC 520 
CVTGD > OAD 02D 1AD 12D 4AD 42D #5AD = 52D 
CVTQF OBC 03C 

CVTQG OBE O8E 

DIVF 083 003 183 103 483 403 583 503 
DIVG 0A3 023 1A3 123 4A3 423 5A3 523 
MULF 082 002 #182 102 482 402 582 502 
MULG 0OA2 022 1A2 122 4A2 422 5A2 522 
SUBF 081 001 181 101 481 401 581 # £501 
SUBG OAl 021 1A1 121 4A 421 #5Al1 == 521 


None /C N NC SS SC MSV #VC 
CVTGQ OAF O02F 1AF 12F 4AF 42F SAF 52F 
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C.5 Opcode Summary 


‘Table C-9 lists all Alpha opcodes from 00 (CALL_ PALL) through 3F (BGT). In the 

table, the column headings appearing over the instructions have a granularity of 
_ 8g. The rows beneath the leftmost column supply the individual hex number to 
~ resolve that granularity. 


If an instruction column has a 0 in the right (low) hex digit, replace that 0 with the 
number to the left of the backslash in the leftmost column on the instruction’s row. 

If an instruction column has an 8 in the right (low) hexadecimal digit, replace that 
8 with the number to the right of the backslash in the leftmost column. 


_ For example, the third row (2/A) under the 10,, column contains the symbol INTS*, 
representing the all integer subtract instructions. The opcode for those instructions — 
would then be 12, because the 0 in 10 is replaced by the 2 in the leftmost 

column. Likewise, the third row under the 18,, column contains the symbol JSR*, 
representing all jump instructions. The opcode for those instructions is 1A because 
the 8 in the heading is replaced by the number to the right of the backslash in the — 
leftmost column. 


_ The instruction format is listed under the instruction symbol. 


The symbols in Table C-9 are explained in Table C—10. 
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Table C-9: Opcode Summary 


0/8 
1/9 
O/A 
3/B 
4/C 
5D 
6/E 


“/F 


Table C-—10: Key to Opcode Summary (Table C-9) 
Meaning 


Symbol 


FLTI* 
FLTL* 
FLTV* 
INTA* 
INTL* 
INTM* 
INTS* 
JSR* 
MISC* 
PAL* 
\PAL\ 
Res 


IEEE floating-point instruction opcodes 


00 08 10 18 20 28 30 38 
PAL* LDA INTA* MISC* LDF LDL BR BLBC 
(pal) (mem) (op) (mem) (mem) (mem) (br) (br) 
Res LDAH INTL* \PAL\ LDG LDQ FBEQ BEQ 
(mem) (op) (mem) (mem) = (br) (br) 
Res Res INTS* JSR* LDS LDL_L FBLT BLT 
(op) (mem) (mem) (mem) (br) (br) 
Res LDQ_.U INTM* \PAL\- LDT LDQ. L FBLE BLE 
(mem) (op) (mem) (mem) = (br) (br) 
Res Res Res Res STF STL BSR BLBS 
(mem) (mem) = (br) (br) 
Res Res FLTV* \PAL\ STG STQ FBNE BNE 
(op) (mem) (mem) (br) (br) 
Res Res FLTI* \PAL\ STS STL_C FBGE BGE 
(op) (mem) (mem) (br) (br) — 
Res STQ._U FLTL* \PAL\  STT STQ C FBGT BGT 
(mem) (op) (mem) (mem) (br) (br) 


Floating-point Operate instruction opcodes 
VAX floating-point instruction opcodes 
Integer arithmetic instruction opcodes 
Integer logical instruction opcodes 

Integer multiply instruction opcodes 
Integer subtract instruction opcodes 


Jump instruction opcodes 


Miscellaneous instruction opcodes 
PALcode instruction (CALL_PAL) opcodes 
Reserved for PALcode 

Reserved for Digital 
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~C.6 -Openvas PALcode Format Instructions 


Sections C.6.1 and C.6.2 list the OpenVMS Alpha aupeeiesea and privileged 
PAL code function codes. 
C.6.1 Unprivileged. OpenVMS PALcode Function Codes 


Table C—11 lists the hexadecimal values of the 26-bit function code field for the 
unprivileged OpenVMS PALcode format instructions. The 6-bit opcode for the 
PALcode instructions is zero. | 


Table C-11: Unprivileged OpenVMS PALcode Function codes 


Mnemonic Mnemonic Mnemonic 

AMOVRM OOA1 _ AMOVRR 00A0 BPT 0080 
BUGCHK 0081 CHME 0082 CHMK 0083 
CHMS 0084 CHMU 0085 GENTRAP OOAA 
IMB 0086 INSQHIL 0087 INSQHILR O0A2 
INSQHIQ 0089 INSQHIQR 00A4 INSQTIL 0088 
INSQTILR 00A3 INSQTIQ 008A INSQTIQR OOA5 
INSQUEL 00sB INSQUEL/D 008D INSQUEQ 008C 
INSQUEQD 008E _ .PROBER 008F PROBEW 0090 
RD_PS 0091 READ_UNQ 009E REI 0092 
REMQHIL 0093 REMQHILR O0A6 REMQHIQ 0095 
REMQHIQR 00A8 REMQTIL 0094 REMQTILR 00A7 
REMQTIQ 0096 REMQTIQR 00A9 REMQUEL — 0097 
REMQUEL/D 0099 REMQUEQ 0098 REMQUEQD 009A 
RSCC ——-—sO009D SWASTEN 009B WRITE_LUNQ = 009F 


WR_PS_SW 009C 





_C.6.2 Privileged OpenVMS PALcode Function Codes 


Table C-12 lists the hexadecimal values of the 26-bit function code field for the 
privileged OpenVMS PALcode format instructions. The 6-bit opcode for the PALcode 
- Instructions is zero. 


Table C-12: Privileged OpenVMS PALcode Function Codes 


Mnemonic Mnemonic Mnemonic 

CFLUSH 0001 DRAINA 0002 - HALT 0000 
LDQP . 0003 7 

- MFPR_ASN 0006 MFPR_ASTEN 0026 | MFPR_ASTSR. 0027 
MFPR_ESP 001E MFPR_FEN 000B MFPR_IPL 000E 
MFPR_MCES 0010 | MFPR_PCBB 0012 _ MFPR_PRBR 0013 
MFPR_PTBR 0015 | MFPR_SCBB 0016 | MFPR_SISR 0019 
MFPR_SSP 0020 - MFPR_TBCHK 001A MFPR_USP 0022 
MFPR_VPTB 0029. MFPR_WHAMI 003F 
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Table C—12 (Cont.): Privileged OpenVMS PALcode Function Codes 


Mnemonic _ Mnemonic | Mnemonic 
MTPR_ASTEN 0007 MTPR_ASTSR 0008 MTPR_DATFX 002E 
MTPR_ESP oo1F MTPR_FEN 000C MTPR_IPIR 000D 
MTPR_IPL o0oF MTPR_MCES 0011 MTPR_PERFMON 002B 
MTPR_PRBR 0014 MTPR_SCBB 0017 MTPR_SIRR 0018 
MTPR_SSP 0021 MTPR_TBIA 001B MTPR_TBIAP 001C 
MTPR_TBIS 001D MTPR_TBISD 0024 MTPR_TBISI 0025 
MTPR_USP 0023 MTPR_VPTB 002A 7 
STQP 0004 SWPCTX 0005 unused 0009 
unused 000A 


C.7 Unprivileged OSF/1 PALcode Function Codes 


Table C~—13 lists lists the hexadecimal values of the 26-bit function code field for 
the unprivileged OSF/1 PALcode instructions. The 6-bit opcode for the PALcode 
instructions is zero. 


Table C~13: Unprivileged OSF/1 PALcode Function Codes 


Mnemonic Mnemonic | Mnemonic 
bpt 0080 bugchk 0081  eallsys = 008 
gentrap OOAA imb 0086 rdunique Q09E 


wrunique Q09F 


C. 8 Privileged OSF/1 PALcode function codes 


Table C~14 lists lists the hexadecimal values of the 26-bit function code field for 
the unprivileged OSF/1 PALcode instructions. The 6-bit opcode for the PALcode 
instructions is zero. 


Table C~14: Privileged OSF/1 PALcode Function Codes 


Mnemonic Mnemonic Mnemonic 

halt 0000 rdps 0036 rdusp 003A 
rdval 0032 retsys 003D rti 003F 
swpctx 0030 swpipl 0035 thi 0033 
whami 003C wrent 0034 wrfen 002B 
wrkgp 0037 wrusp 0038 wrval 0031 


wrvptptr 002D 
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C.9 Required PALcode Function Codes 
The opcodes listed in Table C—15 are required for all Alpha implementations. The 
notation used is oo.ffff, where oo is the hexadecimal 6-bit opcode and /ffff is the 
hexadecimal 26-bit function code. 


Table C-15: Required PALcode Function Codes 


Mnemonic Type Function Code 
DRAINA _ Privileged 00.0002 
HALT Privileged ~ 00.0000 — 
IMB Unprivileged 00.0086 


C.10 Opcodes Reserved to PALcode 
The opcodes listed in Table C—16 are reserved for use in implementing PALcode. 


Table C—16: Opcodes Reserved for PALcode 


Mnemonic | Mnemonic : Mnemonic 
PALI9 19 PALIB 1B PALID 1D 
PALI1E 1E PALIF 1F 


C.11 Opcodes Reserved to Digital 
The opcodes listed in Table C—17 are reserved to Digital. 


Table C-17: Opcodes Reserved for Digital 


Mnemonic Mnemonic Mnemonic 

OPCO01 01 OPC02 02 OPC03 03 
OPC04 604 OPC05 05 — ~~ OPC06 06 
OPC07 07 OPC0OA 0A OPCOC 0C 
OPCOD 0D OPCOE OF OPC14 14 


OPC1C 1C 


\PROGRAMMING NOTE (SRM ONLY) 
Opcodes 02, 06, OA, and OE are nominally reserved for 
future extensions to octaword load/store for both integer 
and floating-point formats. 


For IEEE Floating-point opcode 16,., if the function 
code field bits<5:4> are 01 » or the function code 
bits<3:0> are 11012, then an illegal instruction trap 
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is taken. This will allow for future additions of the 
extended IEEE format. \ 
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C.12 \REVISION HISTORY 
Revision 5.0, May 12, 1992 | | 
- Added note on IEEE floating-point code 16, special function code fields 
Added DRAINA to list of required PALcode instructions — 
Added ECO #17, #23 | 
‘Converted to SDML — | | 
Removed /S and /SC opcodes from CVTQF and CVTQG instructions encodings 
Corrected text by removing extra ‘instructions’ from Fig. C-3 text — 
Added CMPBGE to Operate format instruction encoding 
Add opcode for READ_UNQ and WRITE_UNQ 


pt Sete eon UP 


Revision 4.0, March 29, 1991 

Changed /P to /D 

Added RSCC opcode 

Added Scaled Add/Subtract opcodes 
Removed references to D_float 

Updated various opcodes per EV-4 request 
Typos | 
Revision 3.0, Mar 2, 1990 


an eR wo YD HY 


1. Version 3.0 update 


Revision 2.0, October 4, 1989 
1. First Pass 
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. Appendix D 
Registered System and Processor Identifiers 


This appendix contains a registry of Alpha system platform types, system platform 
variations, processor types, processor variations, and processor packaging types. See 
Platform Section, Chapter 3 for a description of these fields. 


\ Send mail to EAGLE1::ALPHA_SRM to register a new Alpha system, platform, or 


processor. Note that the Alpha system types are not equivalent to the VAX SYSTYPE 
values. \ | 
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- Table D-1: System and Processor identification Assignments 
System Type Processor Type | Product Name 


1 ADU 1s Ev-3 
> | 2 = EV-4 

2 Cobra 1=EV-3 | 
. | 2=EV-4 
3 Ruby 1= EV-3 
2 = EV-4 
4 Flamingo 1 = EV-3 

| | 2 = EV-4 | 

5 Mannequin 3 = Simulation 


6 Jensen 2=EV_4 
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Table D-2: System Variation poSignments 

Bit Description | 

0 MPCAP - If set, indicates this system platform is capable of being configured as a 
multiprocessor; all support for multiprocessing is present, even if only one processor 


is present. If clear, this system supports a uniprocessor only. Initialized by the 
console at all cold bootstraps. 


4:1 CONSOLE - Indicates the type of console. Defined values include: - 


<4:1> Interpretation 


0000 # Reserved 

0001 Detached service processor 
0010 Embedded console 

other Reserved for future use 


Initialized by the console at all cold bootstraps. 


7:5 POWERFAIL - Indicates the type of powerfail (if any) implemented by this platform. 
Defined values include: 


<7:5> Interpretation 


000 Reserved 

001 United 

010 Separate 

011 Full battery backup of system platform hardware 


Initialized by the console at all cold bootstraps. | 


8 POWERFAIL RESTART - If set, indicates that the console should restart all available 
processors on a powerfail recovery. If clear, only the primary processor will be 
restarted. Cleared by the console at system bootstraps; may be set by system 
software. 


9 GRAPHICS - If set, indicates that the platform contains an imbedded graphics 
processor. Initialized by the console at all cold bootstraps. 


63:10 RESERVED - MBZ 


Se” 


Registered System and Processor Identifiers. D3 | 





estricted I 





Distribution 


Table D-3: Processor Variation Assignments 


Bit | 


0 


63:3 


Description 


VAX-FP - If set, indicates this processor supports VAX Floating-point operations and 


data types. If clear, this processor has no such nauEpae Initialized by the console at 


all cold bootstraps. 


IEEE-FP - If set, indicates thie processor supports IEEE Floating-point operations 
and data types. If clear, this processor has no such support. Initialized by the console 
at all cold bootstraps. 


PRIMARY ELIGIBLE (PE) - If set, indicates that this processor is eligible to become 
a primary processor. The processor has direct access to the console, a BB_WATCH, 
and all I/O widgets. Initialized by the console at all cold bootstraps. See Platform 
Section, Chapter 4. 


RESERVED - MBZ 


D.1 V/O Architecture Section 


This section includes that information removed from the I/O chapter previously 
located in the Platforms section. 


D.1.1 Special Commands 


The special “WHO_ARE_YOU” command (W=0, B=1, CMD=0) is common to all 

bridge implementations. WHO_ARE_YOU is used to determine the type of remote 
bridge side. In response to a mailbox operation with a WHO_ARE_YOU command 
and RBADR of 0, the remote bridge side returns a unique remote bus side identifier. 
All other commands are specific to the type of remote bus and independent of the 
bridge implementation. 


Table D-4: WHO_ARE_YOU returns 


System oe 
Bus Bridge Type(s) WHO_ARE_YOU returns 
XMI LAMB _ Laser XMI XDEV register 
| - -<81:16> ~—_— Device revison 
| — <15:0> 102Ai¢ 
-Futurebus+ Cobra Not implemented 
FLAG Laser TBD 


DA -1.1 XMi Specific Information 


The XMI CMD field definition is given in Table 11-4. Bits <39:0> of the RBADR field 
are passed unchanged onto the XMI by the remote side. The MASK field is inverted 
to form the XMI byte enables. 
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To access XMI device CSRs, RBADR field bit <31> must be clear and bits <30:29>. 
must be set. Only longword accesses are supported; MASK bits <7:4> must be set 
and WDATA bits <63:32> are ignored by the bridge. 

Table D—-5: XMI CMD field 

Bit(s) Name Description 


<3:0> TRANS _ Transaction type: 
0 undefined 
1 —_— read longword 
2-6 undefined 
7 write longword 


D.1.1.2 Futurebus+ Specific Information 


The Futurebus+ CMD field definition is given in Platform Section, Chapter 1. 
RBADR must be longword aligned for longword read or write accesses and quadword 
aligned for quadword write accesses. The MASK field is passed unchanged onto the 
Futurebus+ by the remote side. 


Table D—6: Futurebus+ CMD field 


Bit(s) Name _s Description 
<3:0> TC Transaction code. 
0 unmasked 


1 undefined 
2 partial - byte mask is valid 
3-7 undefined 


<4> WR Write transaction. | 
0 Read 
id Write 
<6:5> DW Data width. Note that all widths may not be implemented by the 
remote side. 
0 32-bits 
1 64-bits 
2 128-bits 
3  256-bits 


Cobra and FLAG implement only 32-bit and 64-bit data widths. 
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Table D-6 (Cont): Futurebus+ CMD field 


_ -Bit(s) Name Description 

| <7> | | AW Address width. 
Q 32-bits 

| | 1 64-bits 

<22:16> 


F_DIAG FLAG specific diagnostic bits. See FLAG specification. 


<29:23> C_DIAG Cobra specific diagnostic bits. See Cobra I/O specification. 
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D.2 \Revision History 

| Revision 5.0, May 12, 1992 
1. Added XMI and Future+ tables from I/O chapter 
2. Added Jensen identifier : | 
3. Added graphics variation bit (9) 
\ 
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Appendix E 
Registered Console implementation Functions 


This eapcurie contains a registry of functions as implemented by current consoles. 
The first two sections contain the registered environment variables and console 
terminal blocks. The remaining sections summarize the functions implemented by 
existing consoles. 


\ Console functions which vary with implementation are summarized in Platform 
Section, Chapter 2. All console implementations and all such implementation-specific 
functions must be registered with the Alpha Architecture Group by sending mail to 
EAGLE1::ALPHA_SRM.\ 


E.1 Environment Variables 


Table E-1: Option Environment Variables 


Environment Var Notes Description 
ID #£=Symbol 


40-7F — TBD 


E.2 Console Terminal Block Formats 


E.2.1 Serial Line UART 
Console terminal type 02’ supports the full functionality of a VT device. 


If the terminal interface is shared among multiple physical terminals, the device ID © 
indicates which physical terminal is the console terminal. If the terminal interface 
is not shared, the device ID is zero. 


Extended error status may result from the PUTS and GETC console callback 
routines. As shown above, extended status is recorded at offsets [64] and [72]; the 


format is: 
<63:3> SBZ 
<2> *l’ Data Overrun 
3 0’ otherwise 
<l> 1’ Framing error 
0’ otherwise 
<0> 1’ Parity error 


0’ otherwise 
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SET_TERM_CTL alters only the baud rate at offset [56]. Support for multiple baud — 
rates is implementation-specific. _ a | | | 


Figure E-1: Serial Line UART Format 












Device Type = ’02™ -CTB| 


63 62 , 31 . 0 
’ ’ : ‘ 







Length of Device-Specific Data in Bytes = 060°" +424 
Se 
Baud Rate = (19200|9600j2400}1200|300)~ (dec) | | +56 pbuiee: 


PUTS Extended Error Status -+64 { Specific 
GETC Extended Error Status "+72 






T Reserved for Console Use | [~ 


“+144 
nitialized by console, never changed : 


*=| 

‘ = Initialized by console, updated by console 

~ = Initialized by console, updated by system software 
T = Set to '1’ if transmit interrupts enabled 

R = Set to '1’ if receive interrupts enabled 


E.2.2 Graphic Display with LK Keyboard 


Console terminal type ’03’ is connected by a serial line UART and supports the LK 
keyboard functions as follows: 7 | 


¢ 48 graphic keys and spacebar on the typewriter mass 
e Numeric keypad 

¢ Delete, Return, and TAB characters 

¢ Control-character sequences 

e Shift-key (uppercase) sequences 


e CAPS-LOCK activation including the appropriate turning on and off of LED3, 
the CAPS-LOCK LED | | 


°e METRONOME code, B4., used for autorepeat mode 


e Lighting of the LED4 (Hold Screen LED) when output flow control is enabled 
and active. | 


e Severe error keycodes 


E-2 Appendixes 





All other special keycode operations are unsupported. Unsupported functions also 
include the COMPOSE-key and other alternate keycode select mechanisms. 


If the interface to the keyboard is shared among multiple devices (e.g. mouse), the 
device ID indicates the keyboard unit. If the keyboard interface is not shared, the 
device ID is zero. 


Figure E-2: Serial Line UART with LK Keyboard Format 





| 63 62 | 31 ) 
| 
, 
, 
Length of Device-Specific Data in Bytes = ’OE0O”™ 
, 
, 
, 
, 
, 
, 
ALS Recaro eB OMe i 















“+80 | Device- 
Specific 





+88 

Keyboard State “ :+96 
+104 

Keycode Buffer “ 
+144 
. :+152 
if Reserved for Console Use ] 

| | 4256 


* = Initialized by console, never changed 

A = Initialized by console, updated by console 

~ = Initialized by console, updated by system software 
R = Set to '1’ if receive interrupts enabled 


Keyboard 

State Interpretation | Default value 
<0> Keyboard error | | 0, none 

<l> CTRL sequence in progress 0, none 
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Keyboard 


State Interpretation | Default value 
<2> Shift_key sequence in progress 0, off 

<3> CAPS LOCK in effect 3 0, off 

<4> Output flow control enabled 1, enabled 
<5> Output flow control status 0, inactive 


The keyboard is assumed to be in LK200 mode with default settings as described 
in the LK400 Functional Specification, pppeoe II. The default key transmission 
modes at power up are: 


-Keyboard 
Division Mode 
Main Array | autorepeat 
Keypad — | autorepeat 
Del | autorepeat 
Return and Tab down only 
Function keys down only 
Lock,A00,A10 down only. 
Shift,Ctrl1,A01,A09 down up 
Cursor keys autorepeat 


6 Basic Editing Keys down only 


Audio volume keyclick and bell volumes are 2 (dec). the Ctrl (C99) and. 
Shift (B99 and B11) keys do | not not fechas clicks. 


A REINITIATE KEYBOARD command, FD4g¢, is sent to the a wien any of 
the following severe errors are encountered during execution of a console terminal 
callback routine: 


1. TEST MODE ACKNOWLEDGE - B8i¢ 

2. OUTPUT ERROR - B5ig 

3. INPUT ERROR - B6i¢ 

4. KEYBOARD LOCKED CONFIRMATION - B74, 


KEYBOARD_STATE<0> set to ’1’ when a POWER-UP keycode, 3D or 3E xe, is 
received from the keyboard. While KEYBOARD_STATE<0> is set to ’1’, calls to 
GETC or PROCESS_KEYCODE for this unit fail with error status. 


SET_TERM_CTL has no affect on any CTB field for this terminal device type. 
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E.3 Implemented Console Functions 


E.3.1 Cobra and Laser Systems 


The Cobra and Laser Systems share a common console firmware code base. As such, 
most of the implemented functions are common. Common functions are summarized 


below. 


Table E-2: Cobra and Laser Console Functionality 


Function 


HWRPB Version 
CTB format(s) 
Optional Callbacks 


Environment Variables 


BOOTED_DEV format 


BOOT_DEV format 


Description 
Ap 
Serial line UART (type ’02’). 


PSWITCH implemented; SAVE_ENV and PROCESS_KEYCODE 
not implemented. 


No implementation-specific environment variables accessible by 
system software. 


Device path values consist of six fields as follows: 


Field Contents 


protocol mscp 
SCSl 
dssi 
mop 
hose Cobra: Local I/O: 0 
FBus I/O: 1 
Laser:hose: 
0-3 
slot Cobra: FBus node: 0-6 
| Laser: XMI node: 0-14 
FBus node: 0-14 


channel Device channel number (0-n) 
(valid only for multiple channel wid- 
gets) 

remote_address CI, DSSI, SCSI node number 

unit Disk or tape unit number 


A Cobra example is "mop 1 6 1 0 0" indicating a MOP bootstrap 
from the first channel of an FNA at the sixth FBus node. A 
Laser example is "mscp 2 3 0 11 9" indicating a bootstrap from 
disk unit 9 on an HSC connected to CI node 11 accessed from an 
XCD at node 3 of an XMI connected to hose 2 of the IOP. 


The number of list elements is TBD. 
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Table E-2 (Cont.): Cobra and Laser Console Functionality 


Function 


BOOTDEF DEV format 
BOOTED _FILE format 
BOOT_OSFLAGS format 


CONFIG format | 
Bootstrap media 
HALT codes 


Description 


TBD. 
TBD. 


The value consists of a list of up to four si hex digit flags. 
Examples are "2,7", ",7", or "c,2,4,b". 


See Figure TBD for Cobra. See Figure TBD for Laser. 
TBD. | 
No implementation specific codes. 


E.3.2 Flamingo System Console Functions 


Table E-3: Flamingo Console Functionality 


Function Description | 

HWRPB Version 2’. 

CTB format(s) Graphic Display with LK Keyboard (type ’03’). 

Optional Callbacks PROCESS_KEYCODE implemented; SAVE_ENV and PSWITCH 


Environment Variables 


BOOTED_DEV format 
BOOT_DEV format 
BOOTDEF_DEV format 
BOOT_OSFLAGS format 
BOOTED_FILE format 
CONFIG format 
Bootstrap media 

HALT codes 
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not implemented. 


No un plementatonereane environment variables accessible by 
system software. 


TBD. 
See Figure TBD. 
TBD. 


No implementation specific codes. 





d istripution 


E.4 \REVISION HISTORY 
- Revision 5.0, May 12, 1992 
1. Added ECO #30 
2. Converted to SDML 
3. Replace previous Console Chapter with Console ECO #15 
4. Includes 3 chapters and two appendices, renumber I/O Chapter 
5. Material substantially changed or rearranged 
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Aborts, forcing, (DD, 6-6 
Absolute longword queue, (JZ), 2—21 
Absolute quadword queue, (IJ), 2-25 
Access control violation (ACV) fault, (7), 6-10 
has precedence, (IJ), 3-13 
memory protection, (IJ), 3-8 
service routine entry point, (IJ), 6-27 
Access-violation fault, (7), 3-10 
ADDF instruction, (2), 4-88 
ADDG instruction, (J), 4-88 
Add instructions 
See also Floating-point operate 
add longword, (J), 4—23 
add quadword, (2), 4—25 
add scaled longword, (I), 4-24 
add scaled quadword, (J), 4—26 
ADDL instruction, (2), 4-23 
ADDQ instruction, (1), 4-25 
Address space match (ASM) 
bit in PTE, (7D, 3-4; (ID, 3—5 
TBIAP register uses, (IJ), 5—25 
virtual cache coherency, (I), 5—4 
Address space number (ASN) 
defined, (7D), 1-2 
described, (II), 3—8 
in HWPCB, (JD), 4—2 
privileged context, (II), 2-91 
range supported, (JZ), 3-12 
TBCHK register uses, (I), 5—22 
TBIS register uses, (I]), 5—26 
translation buffer with, (77), 3—11 
virtual cache coherency, (1), 5-4 
Address space number (ASN) register, (ID), 


Address translation 
algorithm to perform, (IJ), 3—9 
page frame number (PFN), (7), 3-9 
page table structure, (7), 3-8 _ 
performance enhancements, (IZ), 3—10 
translation buffer with, (77), 3-11 _ 
virtual address segment fields, (IJ), 3-9 

ADDS instruction, (2), 4-89 

ADDT instruction, (J), 4-89 

Aligned byte/word memory accesses, A~11 


ALIGNED data objects, (2), 1-9 
Alignment 
atomic longword, (I), 5-2 
atomic quadword, (J), 5—2 
data alignment trap, (I), 6-16 
data considerations, A—6 
double-width data paths, A~-1 
D_floating, (I), 2-7 
F_floating, (I), 2-5 
G_floating, (), 2-6 
instruction, A—2 
longword, (1), 2~2 
longword integer, (I), 2-11 
memory accesses, A—11 
program counter (PC), (II), 6-6 
quadword, (J), 2-3 
quadword integer, (J), 2—11 
stack, (IJ), 6-31 
S_floating, (J), 2-8 
T_floating, (I), 2-10 
when data is unaligned, (IJ), 6-28 
Alpha architecture 
See also Conventions 
addressing, (I), 2-1 
overview, (J), 1-1 
porting operating systems to, (J), 1-1 
programming implications, (J), 5-1 
registers, (J), 3-1 
security, (1), 1-7 
Alpha privileged architecture library 
See PALcode _ 
AMOVRM (PALcode) instruction, 72), 2—76 
AMOVRR (PALcode) instruction, (IZ), 2-76 
AND instruction, (I), 4-37 
Arithmetic exceptions 
See Arithmetic traps 
Arithmetic instructions, (1), 4-22 
See also specific arithmetic instructions 


' Arithmetic left shift instruction, (2), 4-36 


Arithmetic trap entry (entArith) register, 
_ (IID), 1-2, 5~3, 54 
Arithmetic traps 
defined, (J), 6-9; (ID, 5-1 
described, (I), 6-12 
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Avithadetie traps (cont’d) | 
division by zero, (1), 4-63; (D, 6-14; aD), 
5—5 
F31 as destination, (72), 6-12 
inexact result, 2), 4-64; (dD, 6-15; (ID, 


5-5 

integer overfiow, (I), 4-64; (ID), 6-15; 
(IID), 5-5 

invalid a (D, 4-63; (ID, 6-14; 
(HD, 5— 


éveriow (D), yen (ID), 6-15; (MID, 5-5. 
program counter (PC) value, (77), 6-14 
programming implications for, (I), 5—21 
R31 as destination, (7), 6—12 
recorded for software, (II), 6-13 
REI instruction with, (I), 6-9 
service routine entry point, ([7), 6-27 
system entry for, (I/D), 5-3, 5-4 
TRAPB instruction with, (2), 4-120 
underflow, (1), 4-63; (I), 6-15; (MID, 5-5 
when registers affected by, (I), 6—13 
AST enable (ASTEN) register 
changing access modes in, (II), 4-3 
described, (I), 5-5 
in HWPCB, (ID), 4—2 
interrupt arbitration, (ID, 6-35 
operation (with ASTs), (II), 4-3 
privileged context, (II), 2-91 
SWASTEN instruction with, (J, 2-19 
AST summary (ASTSR) register 
described, (IJ), 5-7 
indicates pending ASTs, (ID), 4-3 
in HWPCB, (ID, 4—2 
interrupt arbitration, (IJ), 6-34 
privileged context, (IJ), 2-91 
Asynchronous system traps (AST) 
ASTEN/ASTSR registers with, (IJ), 4-3 
initiating, (ID, 4-3 
interrupt definition, (ZZ), 6-20 
service routine entry point, (IJ), 6-27 
with PS register, (IJ), 4-3 
Atomic access, (I), 5-2 
Atomic move operations, (IJ), 2~—76 | 
Atomic operations 
accessing longword datum, (J), 5—2 
accessing quadword datum, (J), 5—2 
modifying page table entry, (7), 3—7 
updating shared data structures, (J), 5-6 


using load locked and store conditional, (J), 
5—7 


Atomic sequences, A—17 
-AUTO_ACTION variable, (IV), 2-22, 


— Index-2 | 


Barrier instructions 

_ shared data structures and, (J), 8-10 
use in I/O space read/write ordering, (J), 
7 8-2,8-8 

BB_WATCH, (JV), 3-40 

BEQ instruction, (2), 4—17 


_B field (mailbox), (2), 8-5 


BGE instruction, (I), 4-17 

BGT instruction, (7), 4-17 

BIC instruction, (1), 4—37 

BIS instruction, (1), 4-37 

BLBC instruction, (J), 4—17 | 

BLBS instruction, (J), 4-17 

BLE instruction, (J), 4-17 

BLT instruction, (J), 4—17 

BNE instruction, (2), 4-17 

Boolean instructions, (J), 4-36 
logical functions, (J), 4-37 

Boolean stylized code forms, A-14 

Boot block on disk, ((V), 3-34 

BOOTDEV_DEV variable, (IV), 2~22 

BOOTED_DEV variable, (JV), 2-22 

BOOTED_FILE variable, (JV), 2-23 

BOOTED_OSFLAGS variable, (IV), 2-23 

BOOTP-UDP/IP network bootstrapping, (IV), 

3-40 

Bootstrap address space 
regions, (IV), 3-9 

ia ial ee oo Avg (BIP) processor state 

flag, (IV), 3—14 

Bootstrapping, (IV), 3-1 | 

adding processor while running ater: 
(IV), 3-24 
address space at cold, (IV), 3-9 
boot block in ROM, (IV), 3-38 
boot block on disk, (ZV), 3-34 | 
bootstrap address space goals, (IV), 3-45 
cold in uniprocessor environment, (IV), 3-5 
control to system software, (IV), 3-17 
detached console implementations, (IV), 
3—45 © 

disk media considerations, (IV), 3—49 
from BOOTP-UDP/IP network, (IV), 3—40 
from disk, ([V), 3-33 
from magtape, (JV), 3-35 
from MOP-based network, (JV), 3—39 
from ROM, (IV), 3-38 
implementation considerations, (IV), 3-42 
loading page table space at cold, (IV), 3-10 
loading primary image, (IV), 3—33 
loading system software, (IV), 3-15 
media implementation considerations, (IV), 


MEMC Table at cold, (IV), 3-8 
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multiprocessor, (IV), 3-19 
network boot considerations, (JV), 3-50 
page table coarseness effect, (IV), 3-46 
PALcode loading at cold, (IV), 3-9 
processor initialization, (JV), 3-16 
reaching the address space, (IV), 3-46 
request from system software, (IV), 3-24 
ROM boot considerations, (JV), 3-50 
state flags, ([V), 3-14 
synchronization for multiprocessor, (IV), 
3-19 
system, (IV), 3-5 
warm, (IV), 3-18 
BOOT_DEV variable, (IV), 2—22 
BOOT_FILE variable, (IV), 2—22 
BOOT_OSFLAGS variable, (IV), 2—23 
BOOT_RESET variable, (IV), 2-23 
bpt (PALcode) instruction, (II), 2-2 
required recognition of, (I), 6—4 
BPT (PALcode) instruction, (IJ), 2—4 
required recognition of, (J), 6—4 
service routine entry point, (IJ), 6-28 
trap information, (IJ), 6-16 
Branch instruction format, (2), 3-10 
Branch instructions, (I), 4-16 


See also Control instructions 
backward conditional, (I), 4-17 
conditional branch, (J), 4—17 
displacement, (I), 4—17 
floating-point, summarized, (I), 4-77 
forward conditional, (I), 4—17 
opcodes for, C—2 
unconditional branch, (D), 4-19 
Branch prediction model, (J), 4—15 
Branch prediction stack, with BSR 
instruction, (1), 4-19 
Breakpoint exception, initiating, (IJ), 2—4 
Bridge 
defined, (D), 8-1 | 
MBPR DON bit with, (DD, 8-6 
prefetch interrupts, (2), 8-12 
with I/O space granularity, (1), 8-7 
Bridge special commands, D-4 
BR instruction, (2), 4-19 
BSR instruction, (J), 4-19 
- Bugcheck exception, initiating, (1), 2-5 
bugchk (PALcode) instruction, (7D), 2-3 
required recognition of, (1), 6-4 
BUGCHK (PALcode) instruction, (IZ), 2-5 
required recognition of, (1), 6-4 
service routine entry point, (IJ), 6-28 
trap information, (JI), 6-16 
Byte data type, (D, 2—1 
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Byte manipulation instructions, (1), 4—42 | 
_ See also Extract instructions; Insert 


instructions; Mask instructions 
Byte_within_page field, (II), 3-2; (IID), 3-2 


C 


Cache coherency 3 
barrier instructions for, (2), 5-20 
defined, (I), 5-1 
I/O space access, (I), 8-2 
in multiprocessor environment, (J), 5—5 
memory accesses by devices, (I), 8-17 
with DMA, (J), 8—10 

Caches 
address granularity, (J), 8-14, 8-17 
design considerations, A—1 
flushing physical page from, aD, 2-84 
I-stream considerations, A-5 
MB and IMB instructions with, (1), 5-20 
requirements for, (J), 5-4 
translation buffer conflicts, A-8 
virtual, (7), 8-15 
with powerfail/recovery, (I), 5-4 

callsys (PALcode) instruction, (7), 2—4 
entSys with, (7D), 5-8 
stack frames for, (I/D), 5-3 

CALL_PAL (call privileged architecture 

library) instruction, (J), 4-114 

Canonical form, (I), 4—64 

CFLUSH (PALcode) instruction, (JJ), 2~84 
with powerfail, (IJ), 6—22 

Changed datum, (J), 5~—5 

CHAR_SET variable, (IV), 2—24 , 

CHME (PALcode) instruction, (I), 2-6 
service routine entry point, (II), 6-29 | 
trap initiation, (77), 6-17 

CHMK (PALcode) instruction, (2), 2-7 
service routine entry point, (JJ), 6-28 
trap initiation, (J), 6-17 

CHMS (PALcode) instruction, (IJ), 2-8 
service routine entry point, (JJ), 6-29 
trap initiation, (IJ), 6-17 

CHMU (PALcode) instruction, (7), 2-9 
service routine entry point, (IJ), 6-29 
trap initiation, (77), 6-17 

Clear a register, A—13 

Clock | 
See BB_WATCH 

CLOSE console routine, (IV), 2-44 

CMD field (mailbox), (2), 8—5 

CMOVE@Q instruction, (J), 4-38 

CMOVGE instruction, (J), 4-38 

CMOVGT instruction, (FP), 4—38 
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CMOVLBC instruction, (J), 4-38 
CMOVLBS instruction, (7), 4-38 
CMOVLE instruction, (J), 4-38 


~ CMOVLT instruction, (1), 4-38 


_  CMOVNE instruction, 7), 4-38 
CMPBGE instruction, (1), 4-44. 
CMPEQ instruction, (1), 4-27 
 CMPGEQ instruction, (7), 4—91 
_ CMPGLE instruction, (), 4-91 
CMPGLT instruction, (2, 4-91 
CMPLE instruction, (I), 4-27 
CMPLT instruction, (1), 4-27 
~CMPTEQ instruction, (J), 4-92 
CMPTLE instruction, (1), 4—92 
CMPTLT instruction, (J), 4-92 
CMPTUN instruction, (2), 4—92 
CMPULE instruction, (2), 4-28 
CMPULT instruction, (1), 4-28 
Code forms, stylized, A-12 

Boolean, A—14 

load literal, A-—13 

negate, A—14 . 

NOP, A—13 

NOT, A-—14 

register, clear, A-13 

register-to-register move, A-14 | 
Code sequences, A~-11 
Coherency, cache defined, (J), 5-1 
Compare instructions 


See also Floating-point operate 

compare byte, (J), 4—44 

compare integer signed, (I), 4-27 

compare integer unsigned, (J), 4-28 
Conditional move instructions, (J), 4-38 

See also Floating-point operate 
CONFIG, (IV), 2-19 
Configuration data block, ([V), 2-19 
Console 

adjusting routine virtual address, (IV), 

2—59 

architecture requirements, (IV), 1-5 

at system restart, (IV), 3~25 

at warm bootstrap, (ZV), 3-18 

close device for access, (JV), 2-44 

console I/O mode, (IV), 3-4 

data structure linkage, (IV), 2-61 


data structures loading at cold boot, (IV), 


definition, ([V), 1—1 

detached, (IV), 1-2 

embedded, (IV), 1-2 

environment variables, (IV), 2-22, 2-72 
forcing entry to I/O mode, (IV), 3-82 
getting character from, (IV), 2-31 
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I/O device routines, (JV), 2-42 _ 
implementation considerations, (IV), 1-4, 
2-72 3 

implementations, (IV), 1-2 

implemented functions, E—5 

internationalization, (JV), 1-5 

interprocessor console communications, 

(IV), 2-68 

loading PALcode, (IV), 3-9 

loading system software, (IV), 3-15 

lock mechanisms, ([V), 1-3 

major state transitions, (IV), 3-3 

managing console state, (IV), 2-20 

messages, (IV), 1-3 

miscellaneous routines, (JV), 2~59 

multiprocessor boot, (IV), 3-20 

open device for access, (IV), 2-47 
perform device-specific operations, (IV), 

— 245 

presentation layer, (JV), 1-3 

processor state flags, ([V), 3-15 

program I/O mode, (IV), 3-4 

read from device, (IV), 2—49 

registered implementation functions, E—1 

requirements, (IV), 1-2 

resetting, (IV), 2-38 

RESTORE_TERM routine, (IV), 3~32 

SAVE_TERM routine, ([V), 3-31 

secondary at multiprocessor boot, (IV), 

3-22 


security, (IV), 1-5 
sending commands to secondary, (IV), 2-70 
sending messages to primary, (IV), 2-71 
serial number and revision fields, (ZV), 
2-12 
setting terminal controls, (IV), 2—39 
setting terminal interrupts, (JV), 2-40 
support requirements, (JV), 2~25 
translating keycode, (JV), 2-33 
write to device, (IV), 2-51 
writing characters to, ([V), 2-36 
Console, overview, (J), 7-1 
Console block storage routines, (IV), 2—75 
Console callback routines, ([V), 2—25 
CTB describes, (V), 2-66 
data structures, (IV), 2-61 
implementation considerations, (IV), 2—74 
loading at cold boot, (IV), 3-9 
remapping, (IV), 2-64 
summary, (IV), 2-27 
system software invocation, (IV), 2-27 
system software usage, (IV), 2-26 
Console environment variables 
getting, ([V), 2-54 
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Console environment variables (cont'd) 
implementation considerations, (IV), 2-72 | 


loading system software, (IV), 3~15 
resetting, (IV), 2-55 
routines for, ([V), 2-53 
saving, (IV, ) ’ 2-56 
setting, (IV), 2-58 
Console I/O mode, (IV), 3-3 
forcing entry to, (IV), 3-32 
Console initialization mode, (IV), 3—4 
Console interface, ((V), 2~1 
Console routine block (CRB), (JV), 2-61 
initializing, (JV), 2-63 
structure, (JV), 2-62 
Console terminal block (CTB), (IV), 2-61 
described, ([V), 2-29, 2—66 
implementation example, E-2 
Keyboard example, E—4 
structure, (IV), 2-67 
Console terminal routines, (IV), 2-28 


implementation considerations, (IV), 2—74. 


Context switching 


See also Hardware; Process 
defined, (IJ), 4—1 
hardware, (II), 4—2 
initiating, (I1), 2-90 
raising IPL while, (IJ), 4—4 
software, (II), 4-2 
Control instructions, (1), 4—15 
Control stream DMA, (J, 8-11 
Conventions 
code examples, (J), 1-10 
extents, (J), 1-8 
figures, (I), 1-9 
instruction format, (I), 3-8 
notation, (J), 3-8 
numbering, (I), 1—7 
ranges, (1), 1-8 
/C opcode qualifier 
IEEE floating-point, (J), 4-60 
VAX floating-point, (J), 4-60 
Corrected error interrupts, logout area for, 
(II), 6-25 
CPSY instruction, (2), 4-83 
CPSYN instruction, (2), 4-83 
CPU ID, (IV), 2-11 
CPYSE instruction, (1), 4-83 
CRB 
See Console routine block 
CTB . 
See Console terminal block 
Current mode field 
in PS register, (II), 6-6 


Current PC 


defined, (II), 6-2 


Index 


CVTDG instruction, (I), 4-96 
CVTGD instruction, (1), 4~96 
CVTGF instruction, (D, 4-96 
CVTGQ instruction, (1), 4~94 
CVTLQ instruction, (J), 4~84 
CVTQF instruction, (D), 4—~95 
CVTQG instruction, (I), 4-95 
CVTQL instruction, (D), 4-84 
CVTQS instruction, (), 4~—99 
CVTQT instruction, (D, 4—99 
CVTTQ instruction, (J), 4-98 
CVTTS instruction, (J), 4—100 
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Data alignment, A-6 

Data alignment trap, (7D, 6-15 

Data alignment trap fixup (DAT) bit 
in HWPCB, (ID), 4-2 

Data alignment trap fixup (DATFX) register, 

(II), 5-9 

Data alignment traps 
memory management, (ID), ped 
registers used, (I), 6-16; (IID, 5-4 
service routine entry pan’ (ID, 6-28 
system entry for, (IID), 5-8 

Data format, overview, (J), 1-3 

Data sharing (multiprocessor), A-7 
synchonization requirement, (1), 5—5 

Data stream considerations, A-6 

Data stream DMA, (J, 8-11 

Data types 
byte, (I), 2-1 
IEEE floating-point, (1), 2~7 
longword, (I), 2~ 
longword integer, (J), 2-10. 
quadword, (J), 2-2 | 
quadword integer, (I), 2-11 
unsupported in hardware, (I), 2—12 
VAX floating-point, (), 2-3 
word, (I), 2—1 

Denormal, (I), 4—58 

Detached console, (IV), 1-2 

Devices 
conceptual flow of interrupts, (J), 8-18 
CSRs, (D, 8-17 | 
shared data structures and, (I), 8-10, 8-17 © 

Dirty zero, (I), 4-58 

Disk bootstrap image, (IV), 3-33 

DIVF instruction, (J), 4-102 | 

DIVG instruction, (I), 4-102 

Division 
integer, A~12 
performance impact of, A—12 
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Division by zero trap, (IJ), 6-14; an 5—5 
DIVS instruction, (2), 4-104 
DIVT instruction, (J), 4-104 
DMA, (2), 8-10 | | 
atomic, (J), 8—10 
control stream, (J), 8-11 
data stream stream, (I), 8-11 
defined, (J), 8~2 
interrupts with, (D, $12 
DON field (mailbox), (D, 8-6 
/D opcode qualifier 


FPCR (floating-point control register), (J), 
4—64 


IEEE floating-point, (1), 4—60 
draina (PALcode) instruction, (J), 6—6 
DRAINA (PALcode) instruction, (2), 6-6 
Dual-issue instruction considerations, A—2 
DUMP_DEV variable, (V), 2—23 
DZE bit | 

exception summary parameter, (IJ), 6-13 

exception summary register, ([/7), 5-5 
D_floating data type, (J), 2-6 

alignment of, (D, 2—7 

mapping, (I), 2-6 

restricted, (1), 2-7 


E 
Embedded console, (IV), 1-2 
ENABLE_AUDIT variable, (IV), 2-24 


 entArith 


See Arithmetic trap entry 
entIF 

See Instruction fault entry 
entint 

See Interrupt entry | 
entMM 

See Memory-management fault entry 
entSys 

See System call entry 
Environment variables, (IV), 2—20 
EQV instruction, (1), 4-37 
ERR field (mailbox), (), 8-6 
_ Error checking, (2), 8-6 
Error halt and recovery, (IV), 3-26 
Error messages, console, (IV), 1-3 
_ Errors, processor 

corrected, (11), 6—23 

uncorrected, (1), 6-23 
Errors, system 

corrected, (II), 6—22 

uncorrected, (II), 6-22 
Exceptional events 

oo summarized, (D, 6 6-2 
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‘Fault on execute (FOE), (IJ), 6-12 


Exceptional events (cont'd) 
contrasted, (I/), 6-2 
defined, (I), 6—1 

Exception handlers, B—2 7 
TRAPB instruction with, (D, 4—120 


_ Exception register write mask, (iD), 5-6 


Exceptions 


See also Arithmetic traps; Faults; 
Synchronous traps 
actions, summarized, (IJ), 6-2 — 
defined, (IID), 5-1 
| initiated before interrupts, (ZZ), 6-18 
initiated by ae (I), 6-31 
introduced, (72),6-8 
processor state aaa (ID), 6-36 
stack frames, (7),6-7 
stack frames for, na 5-3 
Exception service routines 
entry point, (JJ), 6-26 
introduced, (J), 6-8 
Exception summary parameter, (IJ), 6-13 
Exception summary register, (III), 5-2, 5-6 
format of, (7D, 5—4 
Executive read enable (ERE), bit in PTE, (2D, 
3—5 


Executive stack pointer (ESP) 
as internal processor register, (IZ), 5-1 
in HWPCB, (ID), 4—2 
Executive stack pointer (ESP) register, (7D), © 
5-27 


Executive write enable (EWE), bit in PTE, 


(ID), 3-6 

EXTBL instruction, (J), 4-46 
EXTLH instruction, (1), 446 
EXTLL instruction, (), 446 
EXTQH instruction, (1), 4-46 
EXTQL instruction, (J), 446 
Extract instructions (list), (), 4—~46 
EXTWH instruction, (2), 4-46 
EXTWL instruction, (1), 4—46 


bit in PTE, (7D, 3-4; (ID, 3-5 

service routine entry point, (II), 6-27 

software usage of, (11), 6-12 
Fault-on-execute fault, (777), 3-10 
Fault on read (FOR), (7D, 6-10 

bit in PTE, 7D, 3-4; (ID, 3-5 

service routine entry point, (JI), 6-27 

software usage of, (IJ), 6-10 
Fault-on-read fault, ZZ), 3-10 


Fault on write (FOW), (77), 6~11 


bit in PTE, (2D, 3-4; (IID, 3-5 





Fault on write (FOW) (cont'd) 
service routine entry point, (I), 6-27 
software usage of, (J), 6-11 
Fault-on-write fault, (77D, 3-10 
Faults 
access control violation, (IJ), 6-10 
defined, (IJ), 6-8; (IID), 5-1 
fault on execute, (IJ), 6-12 
fault on read, (I), 6-10 
fault on write, (77), 6-11 
floating-point disabled, (IZ), 6-10 
memory management, (III), 3-9 
MM fiag, (JD), 6-10 
program counter (PC) value, (JD, 6-8 
REI instruction with, (7D, 6-8 
translation not valid, (ZZ), 6-10 
FBEQ instruction, (2), 4-78 
FBGE instruction, (J), 4-78 
FBGT instruction, (), 4-78 
FBLE instruction, (J), 4-78 
FBLT instruction, (1), 4-78 
FBNE instruction, (1), 4-78 
FCMOVEQ instruction, (J), 4-85 
FCMOVGE instruction, (J), 4-85 
FCMOVGT instruction, (J), 4—85 
FCMOVLE instruction, (J), 4—85 
FCMOVLT instruction, (J), 4—85 
FCMOVNE instruction, (2), 4-85 
FETCH (prefetch data) instruction, (J), 4~-115 
performance optimization, A-10 
FETCH_M (prefetch data, modify intent) 
~ instruction, (J), 4-115 
performance optimization, A-10 
Field replaceable unit table, (JV), 2-19 
Finite number, Alpha, contrasted with VAX, 
(I), 4-57 
FIXUP console routine, (IV), 2-59 
implementation considerations, (IV), 2—75 
using, (IV), 2-64 
Floating-point branch instructions, (J), 4-77 
Floating-point control register (FPCR), (D, 
4-64 


accessing, (I), 4—66 
at processor initialization, (1), 4-67 
bit descriptions, (J), 4-65 
instructions to read/write, (D, 4-87 
operate instructions that use, (J), 4-80 
saving and restoring, (J), 4-67 
Floating-point convert instructions, (J), 3-12 
Floating-point disabled fault, (77), 6-10 — 
service routine entry point, (II), 6-27 
Floating-point division, performance impact 
of, A-12 
Floating-point enable (FEN) register 
defined, (ID), 1-3 
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Floating-point enable (FEN) register (cont'd) | 
described, (ZI), 5-10 
in HWPCB, (JD, 4-2 
privileged context, (I), 2-91 
Floating-point format, number representation 
(encodings), (I), 4—58 | 
Floating-point instructions 
branch (list), (), 4-77 
faults, (), 4-56 | 
introduced, (I), 4—56 
memory format (list), (D), 4-68 
operate (list), (), 4-80 
rounding modes, (J), 4—59 
terminology, (1), 4—57 
trapping modes, (I), 4—60 
traps, (J), 4-56 © 
Floating-point load instructions, (I), 4-68 
load F_floating, (1), 4-69 
load G_floating, (D, 4-70 
load S_floating, (J), 4—71 
load T_floating, (1), 4-72 
with nonfinite values, (J), 4-68 
Floating-point operate instructions, (J), 4-80 
add (IEEE), (1), 4-89 
add (VAX), (J), 4-88 
compare (IEEE), (D), 4-92 
compare (VAX), (J), 4-91 
conditional move, (J), 4—85 
convert IEEE floating to IEEE floating, (D, 
4-100 
convert IEEE floating to integer, (1), 4-98 
convert integer to IEEE floating, (D, 4-99 
convert integer to integer, (I), 4-84 
convert integer to VAX floating, (D), 4-95 
convert VAX floating to integer, (I), 4-94 
convert VAX floating to VAX floating, (D, 
4-96 . 
copy sign, (1), 4-83 
divide IEEE), (D, 4-104 
divide (VAX), (1), 4—102 
format of, (7), 3-11 
move from/to FPCR, (J), 4-87 
multiply IEEE), (7), 4-107 
multiply (VAX), (D, 4-106 
opcodes for, C—3 
subtract IEEE), (2), 4-111 
subtract (VAX), (), 4-109 
Floating-point registers, (1), 3-2 
Floating-point rounding modes 
IEEE, (D, 4—59 
VAX, (2), 4-59 
Floating-point single-precision operations, (J), 
4-64 
Floating-point store instructions, (J), 4-68 
store F_floating, (D, 4—73 
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Floating-point store instructions (cont'd) 
store G_floating, (J), 4—74 
store S_floating, (1), 4—75 
store T_floating, (I), 4—76 
with nonfinite values, (J), 4-68 
Floating-point support 
_ FPCR (floating-point control register), (J), 
4—64 | 


IEEE, (D, 2—7 
IEEE standard 754-1985, (1), 4—67 
instruction overview, (I), 4—56 
longword integer, (J), 2-10 
operate instructions, (J), 4-80 
optional with Alpha, (I), 4~2 
quadword integer, (J), 2-11 
rounding modes, (I), 4—59 
single-precision operations, (J), 4—64 
trap modes, (J), 4—60 
VAX, (1), 2-3 
Floating-point trapping modes, (J), 4—60 
See also Arithmetic traps 
imprecision from pipelining, (J), 4-62 
FOE 
See Fault on execute 
FOR | 
See Fault on read 
FOW 
See Fault on write 
FPCR (floating-point control register) 
See Floating-point control register (FPCR) 
Frame pointer (FP), register linkage for, (7D, 
1~1 
FRU, (IV), 2-19 
Futurebus+ CMD field, D—5 
F_floating data type, (D, 2-3 
alignment of, (), 2—5 
compared to IEEE S_fioating, (2), 2-8 
MAX/MIN, (2), 4-58 
operations, (I), 4-64 
when data is unaligned, (II), 6-28 
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gentrap (PALcode) instruction, (7), 2-5 


required recognition of, (J), 6—4 
GENTRAP (PALcode) instruction, (I), 2—10 
required recognition of, (I), 6-4 
trap information, (/7), 6-17 
GETC console routine, (IV), 2-31 
GET_ENV console routine, (IV), 2—54 
Global pointer (GP), register linkage for, (IID), 
Granularity hint (GH) 
bits in PTE, (2D, 3—5; (ID, 3-4 
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G_floating data type, (I), 2—5 
alignment of, (1), 2-6 
mapping, (I), 2-5 
MAX/MIN, (D2, 4-58 
when data is unaligned, (IZ), 6-28 
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halt (PALcode) instruction, (J), 6-7 
HALT (PALcode) instruction, (1), 6-7 
Halting the processor, (I), 6-7 
Hardware context, (ID, 4—1 
Hardware interrupts 
interprocessor, (JJ), 6-21 
interval clock, 7), 6-20 
powerfail, (77), 6—22 
servicing, (IID), 5-6 
Hardware nonprivileged context, (II), 4-3 
Hardware privileged context, (II), 4—2 
switching, (IJ), 4-2 
Hardware privileged context block (HWPCB) 
at cold boot, ZV), 3-17 
at warm boot, (/V), 3-19 
format, (IJ), 4—2 
original built by HWRPB, (7), 4—4 
PCBB register, (I), 5-16 
process unique value in, (JZ), 2-80 
specified by PCBB, (7D), 4—2 
swapping ownership, (IZ), 2-90 
writing to, (ID), 4-3 
Hardware restart parameter block (HWRPB), 
(IV), 2—1 
field contents, (IV), 2-4 
interval clock interrupt, (7), 6—20 
loading at cold boot, (IV), 3-9 | 
logout area, (ID), 6-25 
overview, (IV), 2—2 
per-CPU slots, ((V), 2~11 
per-CPU slots structure, ([V), 2-13 
revision field, (IV), 2—9 
structure, (IV), 2-3 
system type and variation field, (7V), 2—9 
TB hint block, (7V), 2—10 
Hose, (J), 8-1 
HOSE field (mailbox), (D, 8—5 
HWPCB | 
See Hardware privileged context block 
HWRPB 
See Hardware restart parameter block 


I/O access granularity, (I), 8-2, 8-14 
I/O bus, access delay, (J), 8-14 
I/O device interrupts, (77), 6-20 
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I/O devices, service routine entry points, (ID), 
6-30 

I/O implementation dependencies, (J), 8-13 

I/O space read/write ordering, (I), 8-2, 8—7 

I/O subsystem design, implementation 
considerations, (1), 8—13 

IEEE convert-to-integer trap mode, 
instruction notation for, (1), 4—61 

IEEE floating-point 


See also Floating-point instructions 

exception handlers, B—2 

format, (I), 2-7 

FPCR (floating-point control register), (D, 
4—64 


hardware support, B-1 
NaN, (), 2-8 
options, B—1 
standard, mapping to, B-3 
standard charts, B—10 
S_floating, (D, 2-8 
trap handling, B-4 
trap modes, (J), 4-62 
T_fioating, (), 2-9 
IEEE floating-point instructions 
add instructions, (I), 4—89 
compare instructions, (I), 4-92 
convert from integer instructions, (J), 4-99 
convert IEEE floating format instructions, 
(D), 4-100 
convert to integer instructions, (J), 4—98 
divide instructions, (J), 4-104 
multiply instructions, (J), 4-107 
opcodes for, C—4 
operate instructions, (J), 4—80 
qualifiers, summarized, C—4 
subtract instructions, (1), 4-111 
IEEE rounding modes, (I), 4—59 
IEEE standard 
conformance to, B—1 
mapping to, B-3 
support for, (1), 4-67 
IEEE trap modes, required instruction 
notation, (J), 4-61 
IGN (ignore), (J), 1-9 
Illegal instruction trap, (I), 6-16 
service routine entry point, (J), 6-28 
Illegal operand trap 
service routine entry point, (IJ), 6-28 
Illegal PALcode operand trap, (IJ), 6-17 
imb (PALcode) instruction, (J), 6-8 
IMB (PALcode) instruction, (2), 5-17, 6-8 
virtual I-cache coherency, (I), 5—5 
IMP (implementation dependent), (I), 1-9 
INE bit 
exception summary parameter, (II), 6-13 





Index 


INE bit (cont’d) 
exception summary register, (III), 5—5 
Inexact result trap, (ZD, 6-15; (IID, 5-5 
Infinity, (I), 4-57 
Initial virtual memory regions, (IV), 3—11 
Input/output interrupts, (77), 6-22 
INSBL instruction, (J), 4-50 
Insert instructions (list), (), 4-50 
Insert into queue PALcode instructions 
longword at head interlocked, (ZZ), 2—31 
longword at head interlocked resident, (JJ), 
2-33, 2-48 
longword at tail interlocked, (7), 2-39 
longword at tail interlocked resident, (JD, 
242, 2-50 
quadword at head interlocked, (IJ), 2-35 
quadword at head interlocked resident, 
(II), 2-37 
quadword at tail interlocked, (IJ), 2—44 
quadword at tail interlocked resident, (77), 
2—46 


INSLH instruction, (2), 4-50 
INSLL instruction, (2), 4-50 
INSQHIL (PALcode) instruction, (J), 2-31 
INSQHILR (PALcode) instruction, (7), 2-33 
INSQH instruction, (1), 4-50 © 
INSQHIQ (PALcode) instruction, (77), 2-35 
INSQHIQR (PALcode) instruction, (7), 2-37 
INSQL instruction, (1), 4—50 
INSQTIL (PALcode) instruction, (ID), 2—39 
INSQTILR (PALcode) instruction, (IJ), 2-42 
INSQTIQ (PALcode) instruction, (IJ), 2-44 
INSQTIQR (PALcode) instruction, (IZ), 2-46 
INSQUEL (PALcode) instruction, (IJ), 2-48 
INSQUEL/D (PALcode) instruction, (77), 2-48 
INSQUEQ (PALcode) instruction, (7), 2—50 
INSQUEQ/D (PALcode) instruction, (J), 2-50 
Instruction encodings 

floating-point format, C-3 

summarized, C—1 
Instruction fault 

system entry for, (I/D), 5-3 
Instruction fault entry (entIF) register, (ZZD, 

1—2, 5-3, 5-6 

Instruction formats 

branch, (J), 3-10 

conventions, (J), 3-8 

floating-point convert, (I), 3-12 

floating-point operate, (J), 3-11 

illegal trap, (II), 6-16 

memory, (J), 3-9 

memory jump, (J), 3-10 

operands, (J), 3-8 

operand values, (J), 3-8 

operate, (1), 3~-10 
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Instruction formats (cont’d) 
operators, (J), 3-5 
overview, (I), 1—4 
PALcode, (J), 3-12 
registers, (I), 3-1 
Instructions, overview of, (J), 1-5 
Instruction set 
See also Floating-point instructions; 
PALcode instructions 
access type field, (7), 3-4 
Boolean (list), (D, 4-36 
branch (list), Z), 4-16 
byte (list), (D, 4-42 
conditional move (integer), (), 4—38 
data type field, (), 3-5 
extract (list), (D, 4-42 
floating-point subsetting, (J), 4-2 
insert (list), (D, 442 
integer arithmetic (list), (D, 4-22 
introduced, (2), 1-6 
jump (list), (), 4~—16 
load memory integer (list), (), 44 
mask (list), (J), 442 
miscellaneous (list), (2), 4-113 
name field, (2), 3—4 
opcode qualifiers, (1), 4—3 
operand notation, (J), 3-4 
overview, (J), 4—1 
shift, arithmetic, (I), 4—41 
shift, logical, (I), 4—40 
software emulation rules, (J), 4—2 
store memory integer (list), (J), 44 
VAX compatibility, (7), 4-121 
Instruction stream 
See I-stream 
INSWH instruction, (J), 4—50 
INSWL instruction, (J), 4~50 
Integer arithmetic instructions 
See Arithmetic instructions 
Integer division, A-12 
Integer overflow trap, (1, 6-15; (ZZD, 5-5 
Integer registers 
defined, (1), 3-1 
R31 restrictions, (J), 3—1 
Integer register usage, (I/I), 1-1 
Internal processor registers (IPR) 
address space number (ASN), (), 54 
AST enable (ASTEN), (2D, 5-5 
AST summary (ASTSR), (2), 5-7 
CALL_PAL MFPR with, (2D, 5-1 
CALL_PAL MTPR with, (7D, 5-1 


data alignment trap fixup (DATFX), (ID, 
5—9 | 


defined, (77), 1-1 
executive stack pointer (ESP), (17), 5-27 
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floating-point enable (FEN), (7D), 5-10 

interprocessor interrupt request (IPIR) 
register, (fI), 5-11 

interrupt priority level (IPL), (77), 5-12 © 

kernel mode with, (7D, 5-1 

machine check error summary (MCES), 
(ID, 5-13 | 

MFPR instruction with, (I), 2-86 

MTPR instruction with, 7D, 2-87 

page table base (PTBR), (7), 5-18 

performance monitoring (PERFMON), (7D, 
5-15 

privileged context block base (PCBB), (ID, 
5—16 7 


processor base (PRBR), (7), 5-17 

software interrupt request (SIRR), (7D, 
5-20 

software interrupt summary (SISR), (ZD, 
5-21 

summary, ([]), 5-2 

supervisor stack pointer (SSP), (7), 5-28 

system control block base (SCBB), (2D, 
5-19 

translation buffer check (TBCHK), (2D, 
5-22 


translation buffer invalidate all (TBIA), 
(ID), 5-24 

translation buffer invalidate all process 
(TBIAP), (72), 5-25 

translation buffer invalidate single (TBIS), 
(II), 5-26 

user stack pointer (USP), (I), 5-29 

virtual page base (VPTB), (ID), 5-30 

Who-Am-I (WHAMI), @D, 5-31 


Interprocessor console communications, (IV), 


2-68 
implementation considerations, (IV), 2—76 


Interprocessor interrupt, (IJ), 6-21 


protocol for, (ID), 6-21 
service routine entry point, (IJ), 6-30 


Interprocessor interrupt request (IPIR) 


register 
described, (IJ), 5-11 
protocol for, 7D), 6-21 


Interrupt entry (entInt) register, (JD), 1-2, 


Interrupt priority level (IPL), (1), 8-18 


See also Interrupt priority level (IPL) 
register 

associated events, (7), 6~18 

field in PS register, (ID), 6-6 

hardware levels, (77), 6-7 


kernel mode software with, (7D), 6-18 
anaratinn af (TT) 6-17 


Ww pws Hawes Wig {eats 9 Ww 





Interrupt priority level (IPL) (cont'd) 
PS with, (7D), 5-2 
recording pending software (SISR register), 
(II), 5-21 
requesting software (SIRR register), (ID, 
5—20 


service routine entry points, (J), 6-30 
software interrupts, (IJ), 6-19 
. software levels, (/]), 6—7 
starvation and timeouts, (J), 8-15 
Interrupt priority level (IPL) register 
See also Interrupt priority level (IPL) 
described, (77), 5-12 
interrupt arbitration, (J), 6-35 
Interrupts 
actions, summarized, (IJ), 6—2 
device, (1), 8-18 
from I/O devices, (I), 8-12 
hardware arbitration, (I), 6-34 
1/O device, (II), 6-20 
initiated by PALcode, (7), 6-31 
initiation, (77), 6-18 
input/output, (ID), 6-22 
instruction completion, (IJ), 6-17 
interprocessor, (IJ), 6-21 
introduced, (IJ), 6-17 
multiply targeted, (), 8-18 
ordering of, (1), 8-19 
PALcode arbitration, (I), 6-34 
passive release, (II), 6—20 
powerfail, (II), 6-22 
processor state transitions, (17), 6—36 
program counter value, (II), 6-2 
software, (IJ), 6-19 
sources for, (ID), 5-2 
stack frames, (JI), 6-7 
stack frames for, (I7Z), 5—3 
system entry for, (I/D), 5-4 
vectors, (I), 8-12 
Interrupt service routines 
entry point, (IJ), 6-26 
in each process, (IJ), 6-18 
introduced, (J), 6-17 
Interval clock interrupt, (2), 6-20 
service routine entry point, (7D), 6-30 
Invalid operation trap, (7), 6-14; (IID, 5-5 
INV bit 
exception summary parameter, (IJ), 6-13 
exception summary register, (II), 5-5 
IOCTL console routine, (IV), 2-45 
/I opcode qualifier, IEEE floating-point, (2), 
4—61 


IOV bit 
exception summary parameter, (II), 6-14 
exception summary register, (IJ), 5—5 


Index 


IPR 7 
See Internal processor registers (IPR) 
IPR_KSP (internal processor register kernel 
stack pointer), (IJ), 5-1 
ISO-LATIN-1 support, (IV), 1-6 
I-stream 
coherency with D-stream, (J), 6-8 
design considerations, A-2 
modifying physical, (J), 5-5 
modifying virtual, (1), 5-5 
PALcode with, (2), 6-3 
with caches, (J), 5-5 
I-stream coherency, (I), 6-8 


J 


JMP instruction, (I), 4—20 
JOR instruction, (J), 4—20 
JSR_COROUTINE instruction, (J), 4—20 
Jump instructions, (1), 4-16, 4—20 
See also Control instructions 
branch prediction logic, (J), 4-21 
coroutine linkage, (1), 4-21 
return from subroutine, (1), 4—20 
unconditional long jump, (J), 4—21 


K 


Kernel global pointer (KGP), (7D, 1-3 
Kernel mode, protection code with, (IZ), 3-6 
Kernel read enable (KRE) 

bit in PTE, (7D, 3-5; (iD, 34 

with access. control violation (ACV) fault, 

(ID), 3-13 

Kernel stack, PALcode access to, (IJ), 6-31 
Kernel stack pointer (KSP) 

defined, (77), 1-3 

in HWPCB, (ID), 4-2 
Kernel write enable (KWE) 

bit in PTE, (7D, 3-6; (ID, 3-4 
Kseg 

format of, (JD), 3—2 

mapping of, (72), 3-1 

physical space with, (IID), 3-3 


L 


LANGUAGE variable, (V), 2—24 
LDAH instruction, (2, 4—5 
LDA instruction, (J), 4—5 
LDF instruction, (J), 4-69 

when data is unaligned, (IJ), 6-28 
LDG instruction, (J), 4-70 

when data is unaligned, (II), 6-28 
LDL instruction, (J), 4—6 

when data is unaligned, (ID), 6-28 
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LDL_L instruction, (D, 4-8 
restrictions, (J), 4—9 
when data is unaligned, (/J), 6-28 
with processor lock register/flag, (I), 4-8 
with STx_C instruction, (J), 4-8 
LDQ instruction, (2), 4—6 
when data is unaligned, (J), 6-28 
LDQP (PALcode) instruction, (IJ), 2—85 
LDQ_L instruction, (1), 4-8 
restrictions, (I), 4—9 | 
when data is unaligned, (J), 6~28 
with processor lock register/flag, (I), 4-8 
with STx_C instruction, (2), 4-8 
LDQ_U instruction, (2), 4-7 
LDS instruction, (D, 4-71 
when data is unaligned, (ZI), 6-28 
LDT instruction, (J), 4-72 
when data is unaligned, (ID), 6-28 
LICENSE variable, ([V), 2—24 
Literals, operand notation, (J), 3-4 
LK keyboard graphic display, E—2 
Load instructions 
See also Floating-point load instructions 
emulation of, (1), 4-2 
FETCH instruction, (J), 4-115 
load address, (I), 4—5 
load address high, (1), 4—5 
load quadword, (I), 4-6 
load quadword locked, (J), 4-8 
load sign-extended longword, (2), 4-6 
load sign-extended longword locked, (J), 
A—8 


load unaligned quadword, (J), 4~7 
multiprocessor environment, (J), 5—5 
serialization, (J), 4-117 
when data is unaligned, (IJ), 6-28 
Load literal, A~13 
Load memory integer instructions (list), (, 
4—4 


Local devices, (J), 8-1 
Local I/O space, (I), 8-2 
flow control, (), 8—15 
Local side, (7), 8-1 
Location, (J), 5-10 
Location access order 
defined, (2), 5-11 
with processor issue order, (J), 5-12 
Lock flag, per-processor 
defined, (I), 3-2 
with load locked instructions, (J), 4—8 
with store conditional instructions, (J), 
4—1] 
Lockout, (D, 8-3 
Lock registers, per-processor 
defined, (1), 3-2 
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Lock registers, per-processor (cont'd) 
with load locked instructions, (J), 4-8 
with store conditional instructions, (J), 

4—11 2 

Lock_flag register, (IZ), 1-3 

Logical instructions 


See Boolean instructions 
Logout area, (I), 6-25; (ID, 5-7 
Longword data type, (1), 2-2 

alignment of, (J), 2—11 

atomic access of, (J), 5-2 

integer floating-point format, (J), 2-10 
LSB (least significant bit), defined for 

floating-point, (I), 4-57 


Machine check error summary (MCES) 
register 
described, (IJ), 5-13 
using, (JD), 6-24 
Machine checks, ({J), 6-22; (ID, 5-6 
actions, summarized, (IZ), 6—2 
cannot disable, (ZJ), 6-24 
initiated by PALcode, (7), 6-31 
introduced, (1), 6—22 
logout area, (IJ), 6—25 
masking, (I), 6—23 
one per error, (II), 6-24 
processor correctable, (JJ), 6-23 
program counter (PC) value, (ID), 6-24 
REI instruction with, (2), 6-23 
retry flag, (ID), 6-24 
service routine entry points, (IJ), 6-30 
stack frames, (IJ), 6-7 
system correctable, (7), 6-23 
Magtape bootstrap image 
ANSI format, (IV), 3-35 
boot blocked, (7V), 3-37 
Mailbox 
address alignment, (J), 84 
bus-specific implementations for, (1), 8-12 
CMD field checking, (2), 8-13 
comparison to direct access method, (2), 
8~14 
error reporting, (I), 8-8 
field checking, (1), 8-12 
modification by host, (), 8-6 
observing effects of remote writes, (J), 8-16 
operational definition, (I), 8-2 
posting, (), 8-2 
posting software with, (D), 8-6 
remote reads, (J), 8-6, 8-8 
remote writes, (1), 8-6, 8-9 
static, (1), 8-6 


Distribution 


Mailbox (cont'd) 
structure, (J), 8-5 
synchronization with, (J), 8-16 
translating direct accesses, (J), 8-14 
use of STQ_C lock_flag, (D, 8-3, 8-8, 8-15 
WHO_ARE_YOU command, (J), 8-13 
with I/O space granularity, (1), 8-7 
Mailbox pointer (MBPR) register, (I), 8-4 
definition, (I), 8-2 
flow control, (1), 8-15 
ordering, (J), 8-7 
Mailbox starvation, (J), 8-16 
Major modes, (JV), 3-3 
Major states, (IV), 3-1 
Major state transitions, (IV), 3-2 
console rules, (IV), 3-3 
MASK field (mailbox), (2), 8-5 
Masking, machine checks with, (JJ), 6-23 
Mask instructions (list), (), 4-52 
MAX, defined for floating-point, (1), 4—59 
maxCPU, (ID, 1-2 
MB (memory barrier) instruction, (1), 4-117 
See also IMB 
multiprocessors only, (J), 4—117 
using, (I), 5-18 
with DMA I/O, (D, 5-17 
with multiprocessor D-stream, (I), 5-17 
MBPR 
See Mailbox pointer (MBPR) register 
MBZ (must be zero), (J), 1-9 
MEMDSC 
See Memory data descriptor table 
Memory, unrecoverable errors with, (I), 6-22 
Memory access 
aligned byte/word, A—11 
coherency of, (J), 5-1 
granularity of, (I), 5-2 
width of, (1), 5-2 
Memory access sequence, (J), 5-11 
Memory alignment, requirement for, (J), 5—2 
Memory cluster descriptor (MEMC) table 
structure, (IV), 3-8 
Memory data descriptor (MEMDSC) table 
structure, (IV), 3-7 
with cold boot, (IV), 3-6 
Memory format instructions 
function codes, summarized, C—1 
opcodes for, C-1 | 
- Memory instruction format, (J), 3-9 
with function code, (2), 3-9 
Memory interlocks, (1), 8-17 
Memory jump instruction format, (J), 3-10 
_ Memory-like behavior, (D, 5-3 
Memory management 


Index 


Memory management (cont’d) 


See also Address translation; Pages; 
Processor modes; Virtual address 
space 

address translation, (7), 3-8 

always enabled, (J), 3-3 

control of, (7D, 3-3 

faults, (ID, 3-13, 6-9; (ID, 3-9 

introduced, (IJ), 3—1 

page frame number (PFN), (7D), 3-6 

page table entry (PTE), (ID), 3-3 

protection code, (ID), 3-8 

protection of individual pages, (ID), 3-7 

PTE modified by software, (IZ), 3-7 

support in PALcode, (J), 6-3 

translation buffer with, (JZ), 3-11 

unrecoverable error, (IZ), 6-22 

with interrupts, (ZI), 6-18 

with multiprocessors, (IJ), 3-7 

with process context, (IJ), 4-1 

Memory-management fault entry (entMM) 
register, (III), 1-2, 5-4, 5-7 
Memory management faults 

registers used, (IJ), 6-10 

system entry for, (III), 5-4 

types, (IID), 3-9 

with unaligned data, (IJ), 6-16 

Memory prefetch registers, A—10 
defined, (J), 3-2 | 
Memory protection, (ID), 3-6 
MFPR_IPR_name (PALcode) instruction, (72), 
2-86 


MF_FPCR instruction, (J), 4-87 

MIN, defined for floating-point, (2), 4-58 

Miscellaneous instructions (list), (), 4-113 

MMCSR, (JID, 5-7 

MMCSR code, (7D), 3-9 

MOP-based: network bootstrapping, (IV), 3-39 

/M opcode qualifier, IEEE floating-point, (7), 
4-60 


Move, register-to-register, A-14 
Move instructions (conditional) 

See Conditional move instructions 
MSKBL instruction, (J), 4—52. 
MSKLH instruction, (J), 4-52 
MSKLL instruction, (1), 4—52 
MSKQL instruction, (J), 4-52 


_ MSKWH instruction, (J), 4—52 


MSKWL instruction, (J), 4—52 
MTPR_IPR_name (PALcode) instruction, (7D, 
2-87 
MT_FPCR instruction, (J), 4-87 
synchronization requirement, (I), 4-66 
MULF instruction, (2), 4-106 
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MULG instruction, (J), 4-106 
MULL instruction, (1), 4-29 
_ with MULQ, (D, 4-29 
MULQ instruction, (2), 4-30 
with MULL, (2), 4-29 
with UMULH, (@, 4-30 
MULS instruction, (DD, 4-107 
MULT instruction, (1), 4-107 
Multiple instruction issue, A—2 
Multiply instructions 
See also Floating-point operate 
multiply longword, (I), 4—29 
multiply quadword, (1),4-30 
multiply unsigned quadward high, (J), 4-31 
Multiprocessor bootstrapping, (ZV), 3-19 
primary processor, (IV), 3-19 
Multiprocessor environment 
See also Data sharing 
booting, (IV), 3-19 
cache coherency in, (J), 5-5 
console requirements, (IV), 2-21 
context switching, (D), 5-18 
interprocessor interrupt, (JJ), 6-21 
I-stream reliability, Z), 5-17 
_ MB instruction with, (2), 5—17 
memory faults, (7), 6-10 
memory management in, (IJ), 3—7 
move operations in, (I), 2—76 
no implied barriers, (I), 5-16 
read/write ordering, (I), 5-9 
serialization requirements in, (I), 4-117 
shared data, (D, 5-5, A-7 
Multiprocessors 
I/O with, 2), 8-3 
interrupts with, (J), 8-12 
Multithread implementation, (JZ), 2-80 


N 


NaN (Not-a-Number) 
defined, (1), 2-8 
Quiet, (2), 4-57 
Signaling, (2), 4—57 
NATURALLY ALIGNED data objects, (I), 1-9 
_Negate stylized code form, A—14 
Network bootstrapping, (IV), 3-39 
implementation considerations, (IV), 3—50 
Next PC, (7D, 6-2 
Next PC, defined for arithmetic traps, (ZD, 
6-14 
Nonmemory-like behavior, (I), 5-3 
NOP, A-13 . 
NOT instruction, ORNOT with zero, , 4—37 
NOT stylized code form, A-14 
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Opcode qualifiers 


See also specific qualifiers: 
default values, (I), 4—3 
notation (list), (2), 4-3 
Opcodes 
DEC OSF/1, C—9 
OpenVMS, C-8 
reserved, C—10 
summarized, C-6 
opDec, (JID), 1-4 
OPEN console routine, ((V), 2—47 
OpenVMS PALcode instruction opcodes, C—8 
OpenVMS PALcode instructions (list), (2D, 
2-2 
Operand expressions, (J), 3-3 
Operand notation 
defined, (J), 3-3 
from VAX architecture standard, (J), 3-4 
Operand values, (J), 3-3 
Operate format instructions, opcodes for, C—2 
Operate instruction format, (J), 3-10 
floating-point, (), 3-11 
floating-point convert, (1), 3-12 
Operators, instruction format, (J), 3-5 
Optimization | 
See Performance optimizations 
ORNOT instruction, (7), 4~37 
OSF/1 PALcode instruction aaa C-9 
Overflow trap, 7D, 6-15; (ID, 5-5 
OVF bit 
exception summary parameter, (IJ), 6-13 
exception summary register, (III), 5-5 


P 


Page frame number (PFN) 
bits in PTE, (ID, 3-6; (ID, 34 
determining validitation, (IJ), 3-4 
finding for SCB, (ZZ), 5-19 
PTBR register, (D), 5-18 
with address translation, 7D), 3-9 
with hardware context switching, (7D, 4-3 
Pages 
collecting statistics on, (ZJ), 6-11 
individual protection of, (IJ), 3—7 
max address size from, (IJ), 3-3 
possible sizes for, (II), 3-2 
size range of, (I/J), 3-1 
virtual address space from, (II), 3-2 
pageSize, (IIT), 1-2 
Page sizes, (III), 3~2 


Page table base (PTBR) register, (I), 5-18 


defined, (III), 1-3 





Page table base (PTBR) register (cont’d) 
in HWPCB, (I/D, 4-2 © 
privileged context, (II), 2-91 
with address translation, (I), 3-9 
Page table entry (PTE), (IJ), 3-3 
atomic modification of, (7), 3-7 — 
bit summary, (IID), 3—4 
calculating at cold boot, (IV), 3-13 
changing and managing, (III), 3-5 
format of, (7D), 3-3 
modified by software, (IJ), 3-7 
page protection, (IJ), 3-8 
physical access of, (III), 3-6 
virtual access of, (II), 3-7 
with multiprocessors, (IZ), 3—7 
Page tables 
address space conflicts, (IV), 3-47 
address space/page size, (IV), 3-48 
calculating base, (IV), 3-14 
coarseness effect, (IV), 3-46 
initial mapping at cold boot, (IV), 3-13 
locating space for, (IV), 3-47 
space at cold boot, (JV), 3-10 
Page table space, loading at cold boot, (ZV), 
3-10 
Page table space location, (IV), 3-48 
PALcode 
See also Queues, support for 
access to kernel stack, (7), 6-31 
barriers with, (7), 5—16 
CALL_PAL instruction, (J), 4-114 
compared to hardware instructions, (I), 6-1 
defined for OpenVMS, (ZZ), 2-1 
illegal operand trap, (7), 6-17 
implementation-specific, (J), 6-3 
instead of microcode, (J), 6-1 
instruction format, (), 3-12 
memory management requirements, (JJ), 
3-3 


OSF/1 support for, (JD, 5-8 
overview, (1), 6-1 | 
processor state transitions, (I), 6-36 
queue data type support, (IJ), 2-21 
recognized instructions, (J), 64 
replacing, (J), 6—4 
required function support, (1), 6-3 
required instructions, (J), 6-5 
running environment, (I), 6-2 
special functions, (J), 6-3 

PALcode instructions 
opcodes for required, C-10 
OpenVMS (list), (D, 2-2 
privileged OpenVMS (list), ZD, 2-83 
privileged OSF/1 (list), (ZZ), 2-8 
reserved, opcodes for, C—10 








restricted | 
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PALcode instructions (cont'd) 


threaded OpenVMS, (ID), 2-80 
unprivileged general (list), (ZI), 2~3 
unprivileged OSF/1 (list), (717), 2-1 


PALcode instructions, privileged 


See also individual instructions 

cache flush, (7D), os 

drain aborts, (J), 6-6 

halt processor, (1), 6-7 

load quadword physical, (ZJ), 2-85 

move from processor register, (II), 2-86 

move to processor register, (I), 2-87 

read processor status, (III), 2-9 

read system value, (J/Z), 2-11 

read user stack pointer, (77), 2—10 

return from system call, (1/1), 2-12 

return from trap, fault, or interrupt, (77D, 
2-13 


store quadword physical, (IJ), 2-88 


swap IPL, (ZI), 2—16 

swap privileged context, (II), 2-89 

swap process context, (JII), 2-14 

TB (translation buffer) invalidate, (7D, 
2-17 

who am I, (7D), 2-18 

write floating-point enable, (ZZ), 2—21 

write kernel global pointer, (IJ), 2—22 

write system entry address, (III), 2—19 

write system value, (7), 2—24 

write user stack pointer, (7]), 2-23 

write virtual page table pointer, (I/I), 2~25 


PALcode instructions, thread, (7), 2—80 


read unique context, (IJ), 2-81 
write unique context, (IJ), 2-82 


PALcode instructions, unprivileged 


See also individual instructions 
breakpoint, (7), 2-4; (III), 2-2 
bugcheck, (7), 2-5; (IID), 2-3 
change to executive mode, (/I), 2-6 
change to kernel mode, (II), 2—7 
change to supervisor mode, (IJ), 2-8 
change to user mode, (IJ), 2-9 
generate software trap, (IZ), 2-10 
generate trap, (IID), 2-5 


- insert into queue (list), (2), 2-30 


I-stream memory barrier, (J), 6-8 

probe for read access, (JJ), 2-11 

probe for write access, (II), 2-11 

read processor status, (ZI), 2~13 

read system cycle counter, (IJ), 2-17 

read unique value, (JID), 2-6 

remove from queue (list), (2), 2-30 

return from exception or interrupt, (ID, 
2-14 

swap AST enable, (7), 2~19 
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PALcode instructions, unprivileged (cont'd) 
system call, (77), 2—4 
write PS software field, (77), 2-20 
write unique value, (77D), 2-7 | 
PALcode instructions, unprivileged general 
| (list), 7D, 2-3 
PALcode loading at bootstrap, (IV), 3-9 
PALRESO, (7), 6-3 
-PALRES1, (D, 6-3 
- PALRES2, (2), 6-3 
PALRES3, (J), 6-3 
PALRES4, (2), 6-3 
Passive release interrupt entry point, (ID), 


Passive release interrupts, (I), 6-20 
PC 
See program counter register 
PCC 
See Process cycle counter 
Per-CPU slots, (ZV), 2—11 
field contents, (IV), 2~14 
starting address calculation, (IV), 2—12 
structure, ([V), 2-13 
Per-CPU state flags, (IV), 2-18 
Performance monitoring register (PERF- 
MON), (iD), 5~15 


Performance monitor interrupt entry point, 


(ID), 6-30 
Performance optimizations 
branch prediction, A—3_ 
code sequences, A—11 
data stream, A-6 
for frequently executed code, A~1 
for I-streams, A—2 
instruction alignment, A-2 
instruction scheduling, A-5 
I-stream density, A--5 
multiple instruction issue, A—2 
| shared data, A-7 
PFN 
See Page frame number 
Physical address translation, (II), 3-9 
Physical space, (III), 3-3 
PME 
bit in HWPCB, (IJ), 4—2 
PMI bus, (7), 8—1 
uncorrected protocol errors, (II), 6-22 
Powerfail 
CFLUSH PALcode instruction with, JD, 
6—22 
Powerfail and recovery 
multiprocessor, (IV), 3~—25 
split, (V), 3-26 
uniprocessor, (IV), 3-25 
united, ([V), 3-26 


Index—16 





Powerfail interrupt, (II), 6-22 
service routine entry a ae 6-30 


- Power-up initialization, (IV), 3 


Prefetch data (FETCH een (I), 4-115 
Prefetch data registers, A—10 
Prefetching data, considerations, A—10 
Primary bootstrap image 
format, (IV), 3-33 
loading at cold, (IV), 3-9 
Primary processor 
at multiprocessor boot, (IV), 3-20, 3-22 
definition, (JV), 1—1 
modes, (IV), 3—4 
switching from, (IV), 3-28 
Privileged Architecture Library 
See PALcode 
Privileged context, (II), 2-90 
Privileged context block base (PCBB) register 
described, (II), 5-16 
Privileged PALcode instructions (list), 7D, 
2-83; (ID, 2-8 
PROBER (PALcode) instruction, (1), 2-11 
PROBEW (PALcode) instruction, (77), 2-11 
Process, (II), 4—1 
context switching the, (7), 4—4 
Process context, (III), 4—1 
Process control block (PCB), (ID, 4-1 
structure, ([/]), 4-2 
Process control block base (PCBB) register, 
(III), 1-3 
Process cycle counter (PCC) 
in HWPCB, (7D, 4-2 
privileged context, (I7), 2-91 
RPCC instruction with, (), 4-118 
system cycle counter with, (IJ), 2—17 
Processor 
adding to running system, (IV), 3-24 
states and modes, (IV), 3-1 
Processor base (PRBR) register, (7), 5-17 
Processor identifiers, registered, D—1 
Processor initialization, ([V), 3-16 
Processor issue order 
defined, (J), 5-11 
with location access order, (J), 5-12 
Processor issue sequence, (J), 5-10 — 
Processor memory interconnect 


~ See PMI bus 
Processor modes 
AST pending state, (I), 5-7 
change to executive, (IT), 2-6 
change to kernel, (1), 2—7 
change to supervisor, (I7), 2-8 
change to user, (II), 2-9 


controlling memory access, (TT), 3-8 
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Processor modes (cont’d) 
enabling executive mode reads, (ID), 3-5 
enabling executive mode writes, (ID), 3-6 
enabling kernel mode reads, (JZ), 3-5 
enabling supervisor mode reads, (ID), 3-6 
enabling supervisor mode writes, (JI), 3-6 
enabling user mode reads, (II), 3-6 
enabling user mode writes, (II), 3-6 
page access with, (JJ), 3—1 
PALcode state transitions, (JJ), 6-36 
Processor number, reading, (IJ), 5-31 
Processors 
address granularity of memory references, 
(IT), 8-14 
conceptual flow of I/O interrupts, (D, 8-18 
switching primary, (IV), 2-60 
Processor state, defined, (IJ), 6~—5 
Processor state flags, at multiprocessor boot, 
(IV), 3-23 
Processor state transitions, (IJ), 6-36 
Processor status (PS) register 
bit meanings for, (JD), 5-2 
boostrap values in, (IJ), 6-6 
current, (ID), 6—5 
current mode field, 72), 6-6 
defined, (77), 1-1; (IID, 1-3 
explicit reading of, (II), 6-5 
in processor state, (IJ), 6-5 
interrupt priority level (IPL) field, (7), 6-6 
saved on stack, (I), 6-5 
saved on stack frame, (II), 6—7 
software (SW) field, (7), 6-6 
stack alignment field, (J), 6-6 
virtual machine monitor bit, (2), 6-6 
WR_PS_SW instruction, (IJ), 2-20 
Process unique value (unique) register, (I/D), 
14 


PROCESS_KEYCODE console routine, (IV), 
2-33 

implementation considerations, (IV), 2—75 
Program counter ae register, (I), 3—1 

alignment, (17), 6-6 

current PC defined, (JD, 62 

defined, (III), 1-3 

explicit reading of, (I), 6-6 

in processor state, (IJ), 6-5 

next PC defined, (IJ), 6-14 

saved on stack frame, (IJ), 6-7 

with arithmetic traps, (I), 6-14; (ID, 5-1 
. with faults, (D, 6-8 

with interrupts, (1), 6—2 

with machine checks, (JJ), 6-23 

with synchronous traps, (II), 6-15 
Program I/O mode, (IV), 3-3 


Index 


Protection code, (II), 3-8; (IID), 3-6 
Protection modes, (/1), 6-7 
PS<SP_ALIGN?> field, (7), 2—13 
Pseudo-ops, A—14 

PSWITCH console routine, (JV), 2-60 


PTE 


See Page table entry 
PUTS console routine, ([V), 2-36 


Q 


Quadword data type, (I), 2-2 
alignment of, (2), 2-3, 2-11 
atomic access of, (1), 5-2 
integer floating-point format, (J), 2-11 
loading in physical memory, (IJ), 2—85 
storing to physical memory, (IJ), 2-88 
T_floating with, (J), 2—11 

Queues, support for 
absolute longword, (I), 2-21 
absolute quadword, (IZ), 2-25 
PALcode instructions (list), 72), 2-30 
self-relative longword, (IJ), 2—21 
self-relative quadword, (IJ), 2—26 


R 


R31 
restrictions, (I), 3-1 
with arithmetic traps, (I), 6-12 
RAZ (read as zero), (J), 1-9 
RBADR field (mailbox), (J), 8-5 
RC (read and clear) instruction, (J), 4-122 
RDATA field (mailbox), (), 8-6 
rdps (PALcode) instruction, (IJ), 2-9 
rdunigque (PALcode) instruction, (7D), 2-6 
PCB with, (7D, 4-1 
required recognition of, (I), 6-4 
RDUNIQUE (PALcode) ee 
required recognition of, (I), 6-4 
rdusp (PALcode) instruction, (III), 2-10 
PCB with, (1D, 4-1 
rdval (PALcode) instruction, (Z7J), 2—11 
RD_PS (PALcode) instruction, 7D), 2-13 
READ console routine, ((V), 2-49 
Read/write, sequential, A—10 
Read/write ordering (multiprocessor), (I), 5-9 
determining requirements, (I), 5-9 
memory location defined, (Z), 5-10 
READ_UNQ (PALcode) instruction, (IJ), 2-81 
Registers, (J), 3-1 
floating-point, (), 3-2 
integer, (ID), 3-1 
lock, (), 3-2 
memory prefetch, (J), 3-2 
optional, (1), 3-2 
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Registers (cont’d) | 
‘program counter (PC), (J), m1 
value when unused, (1), 3-8 
VAX compatibility, (), 3-2 
_ with IPRs, (7D, 5-1 
Register-to-register move, A—14 
Register write mask, with arithmetic traps, — 
(ID), 6-14 
REI (PALcode) instruction, (II), 2-14 
arithmetic traps, (IT), 6-9 
faults, (I), 6-8 
interrupt arbitration, (JJ), 6—35 
interrupts, (J), 6-2 
machine checks, (IZ), 6-23 
synchronous traps, (77), 6-15 
Remote devices 
defined, (2), 8-1 | 
interrupts with, (1), 8-12 
with DMA, (J), 8-10 
Remote I/O space, (2), 8-2 
accessing, (1), 8-2, 8-8 
access latency, (I), 8-14 
address size, (1), 8-14 
flow control, (2), 8—3 . 
read/write ordering, (D), 8-9 - 
Remote writes (mailbox), (1), 8-5 
Remove from queue PALcode instructions 
longword, (I), 2—72 
longword at head interlocked, (II), 2-52 


longword at head interlocked resident, (ID), 


2-55 
longword at tail interlocked, (I), 2-62 
longword at tail interlocked resident, (7), 
2-65 
quadword, (II), 2-74 
quadword at head interlocked, (IZ), 2-57 
quadword at head interlocked resident,. 
(II), 2--60 
quadword at tail interlocked, (J), 2-67 
quadword at tail interlocked resident, (ID, 
2—70 
REMQHIL (PALcode) instruction, (7), 2—52 
REMQHILR (PALcode) instruction, (77), 2-55 
REMQHIQ (PALcode) instruction, (7), 2—57 
REMQHIQR (PALcode) instruction, (7), 2-60 
REMQTIL (PALcode) instruction, (7), 2~62 
REMQTILR (PALcode) instruction, (2D, 2—65 
REMQTIQ (PALcode) instruction, (7), 2-67 


REMQTIQR (PALcode) instruction, (77), 2—70 © 


REMQUEL (PALcode) instruction, 72), 2-72 

REMQUEL/D (PALcode) instruction, (2D, 
2-72 

REMQUEQ (PALcode) instruction, (IJ), 2—74 


Index—18 





REMQUEQD (PALcode) ene dD, 
2-74 | 

Representative result, (), 4-57 | 

Reserved instructions, opcodes for, C-10 

Reserved operand, (I), 4-58 

RESET_ENV console routine, (IV), 2-55 

RESET_TERM console routine, (IV), 2-38 

Restart-capable (RC) processor state flag, 
(IV), 3-14 - _ 

RESTORE_TERM console routine, (IV), 3-32 

Result latency, A-5 

RET instruction, (1), 4-20 

retsys (PALcode) instruction, (HID, 2-12 

PS with, (7D), 5-2 
ROM boot block structure, (IV), 3-38 _ 


ROM bootstrapping, (IV), 3-38 — 


implementation considerations, (IV), 3-50 
Rounding modes 


See Floating-point rounding modes 
RPCC (read process cycle counter) instruction, 
(1), 4—118 
RSCC instruction with, (77), 2-18 
RS (read and set) instruction, (J), 4-122 
RSCC (PALcode) instruction, (17), 2-17 
RPCC instruction with, (7), 2-18 
rti (PALcode) instruction, (77), 2-13 
PS with, (7D, 5-2 3 
with exceptions, (I77), 5-1 _ 


S 


S4ADDL instruction, (1), 4-24 
S4ADDQ instruction, (1), 4-26 
S4SUBL instruction, (J), 4-33 
S4SUBQ instruction, (J), 4-35 
S8ADDL instruction, (J), 4-24 
S8ADDQ instruction, (J), 4—26 
S8SUBL instruction, (), 4-33 
S8SUBQ instruction, (J), 4~35 
SAVE_ENV console routine, (IV), 2—56 
SAVE_TERM console routine, ((V), 3-31 
SBZ (should be zero), (), 1-9 
SCC 

See System cycle counter 
Secondary processors 

at multiprocessor boot, (IV), 3-20 

definition, (IV), 1-1 

modes, (IV), 3-4 
Security holes, (), 1-7 

with UNPREDICTABLE results, (2), 1-8 
Seg0, mapping of, (7D), 3-1 
Segl, mapping of, (Z/I), 3-1 
Segment number fields, (7), 3-2 
Self-relative longword queue, (JJ), 2-21 


ion 


Self-relative quadword queue, (IJ), 2—26 

Sequential read/write, A-10 
Serialization, MB instruction with, (), 4—117 
SET_ENV console routine, (JV), 2—58 
SET_TERM_CTL console routine, (JV), 2-39 
SET_TERM_INT console routine, (JV), 2—40 
Shared data (multiprocessor), A—7 
_ changed vs. updated datum, (7), 5-5 
Shared data structures 

atomic update, (J), 5-6 

ordering considerations, (I), 5-7 

using memory barrier (MB) instruction, (J), 

5-8 


Shared memory 

accessing, (J), 5-10 

access sequence, (J), 5—10 

defined, (I), 5-10 

issue sequence, (J), 5-10 
_ Shift arithmetic instructions, (J), 4—41 
Shift logical instructions, (1), 4—40 
Single-precision floating-point, (1), 4—64 
SLL instruction, (2), 4—40 
Software (SW) field, in PS register, (IJ), 6-6 
Software completion bit, 7), 6-13 
Software considerations, A—1 


See also Performance optimizations 
Software interrupt request (SIRR) register 
described, (IJ), 5-20 
interrupt arbitration, (7), 6-35 
protocol for, (IJ), 6-19 
with interrupts, (I), 6-19 
Software interrupts, (IZ), 6-19 
asynchronous system traps (AST), (7D, 
6—20 
protocol between summary and request, 
(IT), 6-19 
recording pending state of, (I), 5-21 
request (SIRR) register, (1), 6-19 
requesting, (ZI), 5-20 
service routine entry points, (IJ), 6-29 
summary (SISR) register, (7), 6-19 
supported levels of, (ZZ), 5-20 © 
Software interrupt summary (SISR) register 
described, (IZ), 5-21 
protocol for, (7), 6-19 
with interrupts, (JJ), 6-19 
Software traps, generating, (IJ), 2—10 
/S opcode qualifier 
IEEE floating-point, (2), 4-61 
VAX floating-point, (1), 4-61 
SP 
See Stack pointer 
SRA instruction, (1), 4—41 
SRL instruction, (I), 4—40 


Index 


Stack alignment, (IJ), 6-31 
Stack alignment (SP_ALIGN) 
field in saved PS, (7), 6-6 
Stack frames, (IJ), 6-7; (IID, 5-3 
Stack pointer (SP) 
defined, (7), 1-1; (ID, 1-4 
register linkage for, ([/J), 1~1 
Stack pointer internal processor registers, 
(ID), 5-1 
Starvation, (1), 8-4 
STATUS field (mailbox), (D, 8-6 
STF instruction, (1), 4-73 
when data is unaligned, (ZI), 6-28 
STG instruction, (J), 4-74 
when data is unaligned, (IJ), 6-28 
STL instruction, (J), 4-13 
when data is unaligned, (IJ), 6-28 
STL_C instruction, (J), 4—11 
when data is unaligned, (IJ), 6-28 — 
with LDx_L instruction, (2), 4—11 
with processor lock register/flag, (I), 4—11 
Store instructions 
See also Floating-point store instructions 
emulation of, (I), 4-2 
FETCH instruction, (J), 4-115 
multiprocessor environment, (J), 5—5 
serialization, (J), 4-117 
store longword, (I), 4—13 
store longword conditional, (J), 4-11 
store quadword, (J), 4-13 
store quadword conditional, (J), 4—11 
store unaligned quadword, (J), 4—14 
when data is unaligned, (1), 6-28 
Store memory integer instructions (list), (D, 
4-4 


STQ instruction, (D, 4-13 
when data is unaligned, (IJ), 6-28 
STQP (PALcode) instruction, (J), 2-88 
STQ_C instruction, (J), 4—11 
- use in accessing MBPR, (J), 8-3, 8-15 
with LDx_L inst., (2), 4—11 
with processor lock register/flag, (J), 4—11 
STQ_L instruction 
when data is unaligned, (JI), 6-28 
STQ_U instruction, (J), 4-14 
STS instruction, (J), 4—75 
when data is unaligned, (I), 6-28 
STT instruction, (J), 4-76 
when data is unaligned, (II), 6-28 
SUBF instruction, (J), 4-109 
SUBG instruction, (J), 4—109 
SUBL instruction, (J), 4-32 
SUBQ instruction, (J), 4~34 
SUBS instruction, (J), 4-111 
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SUBT instruction, (Z), 4-111 
Subtract instructions 
See also Floating-point operate 
subtract longword, (I), 4-32 
subtract quadword, (I), 4-34 _ 
subtract scaled longword, (D), 4~33 
subtract scaled quadword, (J), 4-35 
Supervisor read enable (SRE), bit in PTE, 
(ID), 3-6 
Supervisor stack poiniee (SSP) 
_ as internal processor register, (IJ), 5-1 
in HWPCB, (JD), 4-2 
Supervisor stack pointer (SSP) register, (ID), 
5—28 


Supervisor write enable (SWE), bit in PTE, 
(ID), 3-6 

SWASTEN (PALcode) instruction, (IZ), 2-19 
interrupt arbitration, (JD, ] 
with ASTEN register, (II), 5-6 

SWC bit 
exception summary parameter, (IJ), 6-13 
exception summary register, (III), 5-2, 5-4 

swpctx (PALcode) instruction, (I7/), 2—14 
PCB with, (77D), 4~1 
with ASNs, (7D, 3-8 

SWPCTX (PALcode) instruction, (77), 2-89 
with ASTSR register, (7D), 5-8 

swpipl (PALcode) instruction, (I7Z), 2-16 
PS with, (ID, 5-2 

Synchronous traps, (I/D), 5-2 
data alignment, (JI), 6—15 
defined, (II), 6—9 
program counter (PC) value, (7), 6-15 
REI instruction with, 72), 6-15 

System call entry (entSys) register, (7D, 1-3, 

5—4, 5-8 


System control block (SCB) 
arithmetic trap entry points, (II), 6-27 
fault entry points, (7), 6-27 
finding PFN, (7D, 5-19 
saved on stack frame, CD, 6-7 
structure of, (II), 6-26 
with memory management faults, (IJ), 
3-14 
System control block base (SCBB) register, 
(ID), 5-19 | 
System crash, requesting, (IV), 3-27 
System cycle counter (SCC), reading, (7D, 
2-17 
System entry addresses, (IID), 5-3 
System initialization, (IV), 3-4 
System restarts, (IV), 3-25 
error halt and recovery, (IV), 3-26 
forcing console I/O mode, (IV), 3-32 
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System restarts (cont'd) | 

powerfail and recovery (multiprocessor), 
(IV), 3-25 | 

powerfail and recovery (split), (IV), 3-26 

powerfail and recovery (uniprocessor), (IV), | 
3-25 - 

powerfail and recovery (united), (IV), 3-25 

primary switching, (IV), 3-28 

requesting a crash, (IV), 3-27 

RESTORE_TERM routine, (7V), 3-32 

restoring terminal state, (JV), 3-30 

SAVE_TERM routine, (IV), 3~31 

saving terminal state, (IV), 3-30 


System value (sysvalue) eet, (ID), 1-4 


S_ floating data type 
alignment of, (D, 2-8 
compared to F floating, (I), 2-8 
exceptions, (D), 2-8 
format, (2), 2-8 
mapping, (J), 2-8 
MAX/MIN, (2), 4-58 
operations, (1), 4-64 
when data is unaligned, (JZ), 6-28 


T 


TB 


See Translation buffer 
tbi (PALcode) instruction, (77), 2~17 
with TBs, (JD, 3-8 
Tightly coupled I/O bus, (J), 8-1 
Timeout, (J), 8-4 
Timing considerations, atomic sequences, 
A-17 | 
Translation _ 
physical, (JD), 3-6 
virtual, (ID, 3-7 
Translation buffer (TB), (7D), 3-8 
address space number with, (7D, 3-11 
fault on execute, (JJ), 6-12 
fault on read, (1D), 6-11 
fault on write, (1), 6-11 
granularity hint in PTE, (7), 3-5 
hint block in HWRPB, (JV), 2—10 
with invalid PTEs, (7D), 3—12 
Translation buffer check (TBCHK) register 
described, (17), 5~22 
with translation buffer, (IZ), 3-12 


_ Translation buffer hint block, (IV), 2—10 


Translation buffer invalidate all (TBIA) 
register 
described, (2), 5-24 
with translation buffer, (7), 3-12 
Translation buffer invalidate all process 
(TBIAP) register - 
described, (17), 5—25 





Translation buffer invalidate all process 
(TBIAP) register (cont'd) 
with translation buffer, (IJ), 3-12 
Translation buffer invalidate single (TBIS) 
register 
described, (I), 5-26 | 
Translation not valid fault, 7D, 6-10 
service routine entry point, (IJ), 6-27 
Translation-not-valid fault, (77D, 3-10 
TRAPB (trap barrier) instruction, A—14 
described, (I), 4—120 
with MT_FPCR, (2), 4—66 
_ with trap shadow, (J), 4-62 
Trap handler, with non-finite arithmetic 
operands, (I), 4-63 
Trap handling, IEEE floating-point, B—4 
Trap modes 
floating-point, (I), 4-60 
IEEE, (J), 4-61 
IEEE convert-to-integer, (J), 4-61 
VAX, (D, 4-60 
VAX convert-to-integer, (I), 4-61 
Traps 
See Arithmetic traps 
Trap shadow, (IID), 5-2 
defined, (I), 4-62 
defined for floating-point, (D), 4—58 
trap handler requirement for, (I), 4-62 
Trigger instruction, (JI), 5-2 
True result, (1), 4—57 
True zero, (1), 4—57 
TTY_DEV variable, (IV), 2—24 
T_floating data type 
alignment of, (J), 2-10 
exceptions, (I), 2—10 
format, (I), 2—9 
MAX/MIN, (D, 4—59 
when data is unaligned, (JZ), 6-28 | 


U 


UMULH instruction, (2), 4-31 
with MULQ, (J), 4-30 
Unaligned access fault 
system entry for, (ID, 5-4 
UNALIGNED data objects, (2), 1-9 
Unaligned fault entry (entUna) register, (IID), 
1—3, 5-8 
Unconditional long jump, (J), 4~21 
UNDEFINED operations, (J), 1-7 
Underflow trap, (II), 6-15; (ZID, 5-5 
UNF bit | 
exception summary parameter, (IJ), 6-13 
exception summary register, (I/D), 5-5 
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UNORDERED memory references, (J), 5-9 
UNPREDICTABLE results, (J), 1-7 
Unprivileged PALcode instructions 

VAX compatibility, (7D), 2—75 


-Unprivileged PALcode instructions (list), (ZZD, 


2-1 

/U opcode qualifier 

IEEE floating-point, (), 4-61 

VAX floating-point, (J), 4-61 
Updated datum, (J), 5-5 
User mode, protection code with, (7D), 3-6 
User read enable (URE) 

bit in PTE, (7D), 3-6; (ID, 3-4 
User stack pointer (USP) 

defined, (77), 1-4 

in HWPCB, (2D, 4-2 

internal processor register, (IJ), 5-1 
User stack pointer (USP) register, (IJ), 5—29 
User write enable (UWE) 

bit in PTE, (D, 3-6; (MID, 3-4 


V 


Valid (V) 
bit in PTE, (JD), 3-4; (ID, 3—5 
vasize, (III), 1-2 
VAX compatibility instructions, restrictions 
for, (), 4-121 
VAX compatibility register, (D, 3-2 
VAX convert-to-integer trap mode, (J), 4-61 
VAX floating-point 
See also Floating-point instructions 
D_floating, (D, 2—6 
F_floating, (), 2-3 
G_floating, (D, 2—5 
trap modes, (I), 4-62 
VAX floating-point instructions 
add instructions, (J), 4-88 
compare instructions, (J), 4-91 
convert from integer instructions, (J), 4—95 
convert to integer instructions, (J), 4-94 
convert VAX floating format instructions, 
(D,4-96 — | 
divide instructions, (J), 4—102 
multiply instructions, (J), 4-106 
opcodes for, C—5 
operate instructions, (J), 4—80 
qualifiers, summarized, C—5 
subtract instructions, (1), 4—109 
VAX rounding modes, (J), 4—59 
VAX trap modes, required instruction 
notation, (1), 4-61 — 
Virtual address format, (1D, 3-2 
segment number fields, (IZ), 3-2 
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Virtual address space 
minimum and maximum, aD, 3-2 
page size with, (7), 3~1 
Virtual address spaces, (IIJ), 3-1 
- Virtual address translation, (IZ), 3—10 
Virtual D-cache, (I), 5-3 
maintaining coherency of, (I ), 5—3 
Virtual format, (7D), 3-2 
Virtual I-cache, (1), 5—3 
maintaining coherency of, (J), 5-5 
Virtual machine eee (VMM), bit in PS 
register, (II), 6-6 


Virtual page base (VPTB) register, (IJ), 5-30 | 


ee I page table pointer (VPTPTR), (I/D), 


IV ese qualifier 
IEEE floating-point, (D), 4-61 
VAX floating-point, (J), 4-61 
W 
Warm bootstrapping, (IV), 3-18 
Watchpoints | 
with fault on read, (7), 6-11 
with fault on write, (7), 6-11 
WDATA field (mailbox), (D, 8-6 
W field (mailbox), (D, 8—5 
Whami, (I/D), 1-4 
whami (PALcode) instruction, (7), 2-18 
Who-Am-I (WHAMI) register, (17), 5-31 
WHO_ARE_YOU command, (2), 8-13, D—4 
Word data type, (D), 2—1 
wrent (PALcode) instruction, (777), 2-19 
wrfen (PALcode) instruction, (7D), 2-21 
Write-back caches, requirements for, (1), 5-4 
Write buffers, requirements for, (), 5—4 
WRITE console routine, (IV), 2—51 
WRITE_UNQ (PALcode) instruction, (ID, 
2-82 
wrkgp (PALcode) instruction, (ID), 2-22 
wrunique (PALcode) instruction, (77), 2-7 
PCB with, (7D, 4~1 
required recognition of, (I), 6—4 
~ WRUNIQUE (PALcode) instruction 
required recognition of, (1), 6-4 
wrusp (PALcode) instruction, (77D), 2—23 
PCB with, (ID, 4-1 
wrval (PALcode) instruction, (7D), 2—24 
-wrvptptr (PALcode) instruction, (I7I), 2—25 
WR_PS_SW (PALcode) inst., 7D, 2-20 


X 


XMI CMD field, D4 
XOR instruction, (J), 4-37 
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Z 


ZAP instruction, (J), 4—55 
ZAPNOT instruction, (1), 4—55 
Zero byte instructions (list), (7), 4—55 





